Using climate models to estimate the quality of global observational data sets Introduction There is now overwhelming evidence that Earth’s climate has changed at an unusually rapid pace during the last century, that these changes bear a clear human signature, and that they will be enhanced if anthropogenic emissions continue unabated. The development of large-scale observational networks has been a major advance to reaching such levels of evidence. Observations of essential climate variables [e.g., sea surface temperature (SST), sea ice extent (2)] are indeed central for the study of climate variability (1), for detection and attribution of human-induced climate change (1, 3), and for constraining long-term projections (1, 4). Major international and coordinated observing programs are currently underway to continue these efforts (5). However, with the emergence of multiple observational references (ORs), sometimes divergent, a natural question arises: What is the underlying quality of these products? A direct answer to this question is not easily achieved because there is by definition no universal knowledge of the true state of our climate (6). Here we present a framework for the evaluation of ORs addressing this gap. The approach relies on the use of climate models taken as references, and not as subjects of assessment as has been widely done in the past (7, 8). The rationale behind this approach relies on the so-called “truth-plus-noise” paradigm (9–14), which assumes that observations and models are both noisy versions of the true (but unknown) state of the climate system. In that view, observations and models play symmetrical roles so that it is possible to use one to estimate how close the other is from the true state, and vice versa. In line with this paradigm, we claim that climate models can be appropriate tools for estimating the quality of ORs. We accumulate the necessary evidence in three steps. First, we rely on elementary logic and take advantage of the symmetry of common metrics of model performance. Then, we show with a simple statistical toy model how observational error can degrade model performance (symmetrically to model error), turning this into an opportunity to reverse the process of model evaluation into one of OR evaluation. Finally, we apply the proposed procedure to a realistic test case involving simulations conducted with large-scale general circulation models and a set of ORs. François Massonnet, Omar Bellprat, Virginie Guemas, Francisco J. Doblas-Reyes, (2016), Science 28 Oct pp: 452-455