The model that ate Europe

arXiv covers modeling on an epic scale in Europe’s Plan to Simulate the Entire Earth: a billion dollar plan to build a huge infrastructure for global multiagent models. The core is a massive exaflop “Living Earth Simulator” – essentially the socioeconomic version of the Earth Simulator.

FuturIcT

I admire the audacity of this proposal, and there are many good ideas captured in one place:

  • The goal is to take on emergent phenomena like financial crises (getting away from the paradigm of incremental optimization of stable systems).
  • It embraces uncertainty and robustness through scenario analysis and Monte Carlo simulation.
  • It mixes modeling with data mining and visualization.
  • The general emphasis is on networks and multiagent simulations.

I have no doubt that there might be many interesting spinoffs from such a project. However, I suspect that the core goal of creating a realistic global model will be an epic failure, for three reasons. Continue reading “The model that ate Europe”

The 2009 World Energy Outlook

Following up on Carlos Ferreira’s comment, I looked up the new IEA WEO, unveiled today.  A few excerpts from the executive summary:

  • The financial crisis has cast a shadow over whether all the energy investment needed to meet growing energy needs can be mobilised.
  • Continuing on today’s energy path, without any change in government policy, would mean rapidly increasing dependence on fossil fuels, with alarming consequences for climate change and energy security.
  • Non-OECD countries account for all of the projected growth in energy-related CO2 emissions to 2030.
  • The reductions in energy-related CO2 emissions required in the 450 Scenario (relative to the Reference Scenario) by 2020 — just a decade away — are formidable, but the financial crisis offers what may be a unique opportunity to take the necessary steps as the political mood shifts.
  • With a new international climate policy agreement, a comprehensive and rapid transformation in the way we produce, transport and use energy — a veritable lowcarbon revolution — could put the world onto this 450-ppm trajectory.
  • Energy efficiency offers the biggest scope for cutting emissions
  • The 450 Scenario entails $10.5 trillion more investment in energy infrastructure and energy-related capital stock globally than in the Reference Scenario through to the end of the projection period.
  • The cost of the additional investments needed to put the world onto a 450-ppm path is at least partly offset by economic, health and energy-security benefits.
  • In the 450 Scenario, world primary gas demand grows by 17% between 2007 and 2030, but is 17% lower in 2030 compared with the Reference Scenario.
  • The world’s remaining resources of natural gas are easily large enough to cover any conceivable rate of increase in demand through to 2030 and well beyond, though the cost of developing new resources is set to rise over the long term.
  • A glut of gas is looming

This is pretty striking language, especially if you recall the much more business-as-usual tone of WEOs in the 90s.

Marginal Damage has (or will have) more.

Unprincipled Forecast Evaluation

I hadn’t noticed until I heard it here, but Armstrong & Green are back at it, with various claims that climate forecasts are worthless. In the Financial Post, they criticize the MIT Joint Program model,

… No more than 30% of forecasting principles were properly applied by the MIT modellers and 49 principles were violated. For an important problem such as this, we do not think it is defensible to violate a single principle.

As I wrote in some detail here, the Forecasting Principles are a useful seat-of-the-pants guide to good practices, but there’s no evidence that following them all is necessary or sufficient for a good outcome. Some are likely to be counterproductive in many situations, and key elements of good modeling practice are missing (for example, balancing units of measure).

It’s not clear to me that A&G really understand models and modeling. They seem to view everything through the lens of purely statistical methods like linear regression. Green recently wrote,

Another important principle is that the forecasting method should provide a realistic representation of the situation (Principle 7.2). An interesting statement in the MIT report that implies (as one would expect given the state of knowledge and omitted relationships) that the modelers have no idea to what extent their models provide a realistic representation of reality is as follows:

‘Changes in global surface average temperature result from a combination of emissions and climate parameters, and therefore two runs that look similar in terms of temperature may be very different in detail.’ (MIT Report p. 28)

While the modelers have sufficient latitude in their parameters to crudely reproduce a brief period of climate history, there is no reason to believe the models can provide useful forecasts.

What the MIT authors are saying, in essence, is that

T = f(E,P)

and that it is possible to achieve the same future temperature T with different combinations of emissions E and parameters P. Green seems to be taking a leap, to assume that historic T does not provide much constraint on P. First, that’s not necessarily true, given that historic E cannot be chosen freely. It could still be the case that the structure of f(E,P) means that historic T provides a weak constraint on P given E. But if that’s true (as it basically is), the problem is self-diagnosing: estimates of P will have broad confidence bounds, as will forecasts of T. Green completely ignores the MIT authors’ explicit characterization of this uncertainty. He also ignores the fact that the output of the model is not just T, and that we have priors for many elements of P (from more granular models or experiments, for example). Thus we have additional lines of evidence with which to constrain forecasts. Green also neglects to consider the implications of uncertainties in P that are jointly distributed in an offsetting manner (as is likely for climate sensitivity, ocean circulation, and aerosol forcing).

A&G provide no formal method to distinguish between situations in which models yield useful or spurious forecasts. In an earlier paper, they claimed rather broadly,

‘To our knowledge, there is no empirical evidence to suggest that presenting opinions in mathematical terms rather than in words will contribute to forecast accuracy.’ (page 1002)

This statement may be true in some settings, but obviously not in general. There are many situations in which mathematical models have good predictive power and outperform informal judgments by a wide margin.

A&G’s latest paper with Willie Soon, Validity of Climate Change Forecasting for Public Policy Decision Making, apparently forthcoming in IJF, is an attempt to make the distinction, i.e. to determine whether climate models have any utility as predictive tools. An excerpt from the abstract summarizes their argument:

Policymakers need to know whether prediction is possible and if so whether any proposed forecasting method will provide forecasts that are substantively more accurate than those from the relevant benchmark method. Inspection of global temperature data suggests that it is subject to irregular variations on all relevant time scales and that variations during the late 1900s were not unusual. In such a situation, a ‘no change’ extrapolation is an appropriate benchmark forecasting method. … The accuracy of forecasts from the benchmark is such that even perfect forecasts would be unlikely to help policymakers. … We nevertheless demonstrate the use of benchmarking with the example of the Intergovernmental Panel on Climate Change’s 1992 linear projection of long-term warming at a rate of 0.03°C-per-year. The small sample of errors from ex ante projections at 0.03°C-per-year for 1992 through 2008 was practically indistinguishable from the benchmark errors. … Again using the IPCC warming rate for our demonstration, we projected the rate successively over a period analogous to that envisaged in their scenario of exponential CO2 growth’”the years 1851 to 1975. The errors from the projections were more than seven times greater than the errors from the benchmark method. Relative errors were larger for longer forecast horizons. Our validation exercise illustrates the importance of determining whether it is possible to obtain forecasts that are more useful than those from a simple benchmark before making expensive policy decisions.

There are many things wrong here:

  1. Demonstrating that unforced variability (history) can be adequately forecasted by a naive benchmark has no bearing on whether future forced variability will continue to be well-represented, or whether models can predict future emergence of a signal from noise. AG&S’ procedure is like watching an airplane taxi, concluding that aerodynamics knowledge is of no advantage, and predicting that the plane will remain on the ground forever.
  2. Comparing a naive forecast for global mean temperature against models amounts to a rejection of a vast amount of information. What is the naive forecast for the joint behavior of temperature, preciptiation, lapse rates, sea level, and their spatial and seasonal patterns? These have been evaluated for models, but AG&S do not suggest benchmarks.
  3. A no-change forecast is not necessarily the best naive forecast for a series with unknown variability, if that series has some momentum or structure which can be exploited to do better. The particular no change forecast selected byAG&S is suboptimal, because it uses a single year as a forecast, unneccesarily projecting annual variation into the future. In general, a stronger naive forecast (e.g., a smoothed value of a few recent years) would strengthen AG&S’ case, so it’s unclear why they’ve chosen an excessively naive benchmark. Fortunately, their base year, 1991, was rather “average”.
  4. The first exhibit presented is the EPICA ice core temperature. Roughly 85% of the data shown has a time interval too long to show century-scale temperature variations, and none of it could be expected to fully reveal decadal-scale variations, so it’s mostly irrelevant with respect to the kind of forecasts they seek to evaluate.
  5. The mere fact that a series has unknown historic variability does not mean that it cannot be forecast [corrected 8/18/09]. The EPICA and Vostok CO2 records look qualitatively much like the temperature record, yet CO2 accumulation in the atmosphere is quite predictable over decadal time scales, and models could handily beat a naive forecast.
  6. AG&S’ method of forecast evaluation unduly weights the short term, like the A&G sucker bet does. This is not strictly a problem, but it does make interpretation of the bounds on AG&S’ alternate forecast (“The benchmark forecast is that the global mean temperature for each year for the rest of this century will be within 0.5°C of the 2008 figure.”) a little tricky.
  7. The retrospective evaluation of the 1990/1992 IPCC projection of 0.3C/decade ignores many factors. First, 0.3C/decade over a century does not imply a smooth trend over short time scales; models and reality have substantial unforced variability which must be taken into account. The paragraph cited by AG&S includes the statement, “The rise will not be steady because of the influence of other factors.” Second, the 1992 report (in the very paragraph AG&S cite) notes that projections do not account for aerosols, so 0.3C/decade can’t be taken as a point prediction for the future, even if contingency on GHG emissions is resolved. Third, the IPCC projection stated approximate bounds – 0.2 to 0.5 C/decade – that should be accounted for in the evaluation, but are not. Still, the IPCC projection beats the naive benchmark.
  8. AG&S’ evaluation of the 0.3C/decade future BAU projection as a backcast over 1851-1975 is absurd. They write, “It is not unreasonable, then, to suppose for the purposes of our validation illustration that scientists in 1850 had noticed that the increasing industrialization of the world was resulting in exponential growth in ‘greenhouse gases’ and to project that this would lead to global warming of 0.03°C per year.” Actually, it’s completely unreasonable. Many figures in the 1990 FAR clearly indicate that the 0.3C/decade projection was not valid on [-infinity,infinity]. For example, figures 6, 8, and 9 from the SPM – just a few pages from material cited by AG&S – clearly show a gentle trend <0.05C/decade through 1950. Furthermore, even the most rudimentary understanding of the dynamics of GHG and heat accumulation is sufficient to realize that one would not expect a linear historic temperature trend to emerge from the emissions signal.

How do AG&S arrive at this sorry state? Their article embodies a “sh!t happens” epistemology. They write, “The belief that ‘things have changed’ and the future cannot be judged by the past is common, but invalid.” The problem is, one can say with equal confidence that, “the belief that ‘things never change’ and the past reveals the future is common, but invalid.” In reality, there are predictable phenomena (the orbits of the planets) and unpredictable ones (the fall of the Berlin wall). AG&S have failed to establish that climate is unpredictable or to provide us with an appropriate method for deciding whether it is predictable or not. Nor have they given us any insight into how to know or what to do if we can’t decide. Doing nothing because we think we don’t know anything is probably better than sacrificing virgins to the gods, but it doesn’t strike me as a robust strategy.

Another Look at Limits to Growth

I was just trying to decide whether I believed what I said recently, that the current economic crisis is difficult to attribute to environmental unsustainability. While I was pondering, I ran across this article by Graham Turner on the LtG wiki entry, which formally compares the original Limits runs to history over the last 30+ years. A sample:

Industrial output in Limits to Growth runs vs. history

The report basically finds what I’ve argued before: that history does not discredit Limits.

More Oil Price Forecasts

The history of long term energy forecasting is a rather mixed bag. Supply and demand forecasts have generally been half decent, in terms of percent error, but that’s primarily because GDP growth is steady, energy intensity is price-inelastic, and there’s a lot of momentum in energy consuming and producing capital. Energy price forecasts, on the other hand, have generally been terrible. Consider the Delphi panel forecasts conducted by the CEC:

California Energy Commission Delphi Forecasts

In 1988, John Sterman showed that energy forecasts, even those using sophisticated models, were well represented by a simple adaptive rule: Continue reading “More Oil Price Forecasts”

Take the bet, Al

I’ve asserted here that the Global Warming Challenge is a sucker bet. I still think that’s true, but I may be wrong about the identity of the sucker. Here are the terms of the bet as of this writing:

The general objective of the challenge is to promote the proper use of science in formulating public policy. This involves such things as full disclosure of forecasting methods and data, and the proper testing of alternative methods. A specific objective is to develop useful methods to forecast global temperatures. Hopefully other competitors would join to show the value of their forecasting methods. These are objectives that we share and they can be achieved no matter who wins the challenge.

Al Gore is invited to select any currently available fully disclosed climate model to produce the forecasts (without human adjustments to the model’s forecasts). Scott Armstrong’s forecasts will be based on the naive (no-change) model; that is, for each of the ten years of the challenge, he will use the most recent year’s average temperature at each station as the forecast for each of the years in the future. The naïve model is a commonly used benchmark in assessing forecasting methods and it is a strong competitor when uncertainty is high or when improper forecasting methods have been used.

Specifically, the challenge will involve making forecasts for ten weather stations that are reliable and geographically dispersed. An independent panel composed of experts agreeable to both parties will designate the weather stations. Data from these sites will be listed on a public web site along with daily temperature readings and, when available, error scores for each contestant.

Starting at the beginning of 2008, one-year ahead forecasts then two-year ahead forecasts, and so on up to ten-year-ahead forecasts of annual ‘mean temperature’ will be made annually for each weather station for each of the next ten years. Forecasts must be submitted by the end of the first working day in January. Each calendar year would end on December 31.

The criteria for accuracy would be the average absolute forecast error at each weather station. Averages across stations would be made for each forecast horizon (e.g., for a six-year ahead forecast). Finally, simple unweighted averages will be made of the forecast errors across all forecast horizons. For example, the average across the two-year ahead forecast errors would receive the same weight as that across the nine-year-ahead forecast errors. This unweighted average would be used as the criterion for determining the winner.

I previously noted several problems with the bet:

The Global Warming Challenge is indeed a sucker bet, with terms slanted to favor the naive forecast. It focuses on temperature at just 10 specific stations over only 10 years, thus exploiting the facts that (a) GCMs do not have local resolution (their grids are typically several degrees) (b) GCMs, unlike weather models, do not have infrastructure for realtime updating of forcings and initial conditions (c) ten stations is a pathetically small sample, and thus a low signal-to-noise ratio is expected under any circumstances (d) the decadal trend in global temperature is small compared to natural variability.

It’s actually worse than I initially thought. I assumed that Armstrong would determine the absolute error of the average across the 10 stations, rather than the average of the individual absolute errors. By the triangle inequality, the latter is always greater than or equal to the former, so this approach further worsens the signal-to-noise ratio and enhances the advantage of the naive forecast. In effect, the bet is 10 replications of a single-station test. But wait, there’s still more: the procedure involves simple, unweighted averages of errors across all horizons. But there will be only one 10-year forecast, two 9-year forecasts … , and ten 1-year forecasts. If the temperature and forecast are stationary, the errors at various horizons have the same magnitude, and the weighted average horizon is only four years. Even with other plausible assumptions, the average horizon of the experiment is much less than 10 years, further reducing the value of an accurate long-term climate model.

However, there is a silver lining. I have determined, by playing with the GHCN data, that Armstrong’s procedure can be reliably beaten by a simple extension of a physical climate model published a number of years ago. I’m busy and I have a high discount rate, so I will happily sell this procedure to the best reasonable offer (remember, you stand to make $10,000).

Update: I’m serious about this, by the way. It can be beaten.

More on Climate Predictions

No pun intended.

Scott Armstrong has again asserted on the JDM list that global warming forecasts are merely unscientific opinions (ignoring my prior objections to the claim). My response follows (a bit enhanced here, e.g., providing links).


Today would be an auspicious day to declare the death of climate science, but I’m afraid the announcement would be premature.

JDM researchers might be interested in the forecasts of global warming as they are based on unaided subjective forecasts (unaided by forecasting principles) entered into complex computer models.

This seems to say that climate scientists first form an opinion about the temperature in 2100, or perhaps about climate sensitivity to 2x CO2, then tweak their models to reproduce the desired result. This is a misperception about models and modeling. First, in a complex physical model, there is no direct way for opinions that represent outcomes (like climate sensitivity) to be “entered in.” Outcomes emerge from the specification and calibration process. In a complex, nonlinear, stochastic model it is rather difficult to get a desired behavior, particularly when the model must conform to data. Climate models are not just replicating the time series of global temperature; they first must replicate geographic and seasonal patterns of temperature and precipitation, vertical structure of the atmosphere, etc. With a model that takes hours or weeks to execute, it’s simply not practical to bend the results to reflect preconceived notions. Second, not all models are big and complex. Low order energy balance models can be fully estimated from data, and still yield nonzero climate sensitivity.

I presume that the backing for the statement above is to be found in Green and Armstrong (2007), on which I have already commented here and on the JDM list. Continue reading “More on Climate Predictions”

Evidence on Climate Predictions

Last Year, Kesten Green and Scott Armstrong published a critique of climate science, arguing that there are no valid scientific forecasts of climate. RealClimate mocked the paper, but didn’t really refute it. The paper came to my attention recently when Green & Armstrong attacked John Sterman and Linda Booth Sweeney’s paper on mental models of climate change.

I reviewed Green & Armstrong’s paper and concluded that their claims were overstated. I responded as follows: Continue reading “Evidence on Climate Predictions”