Polar Bears & Principles

Amstrup et al. have just published a rebuttal of the Armstrong, Green & Soon critique of polar bear assessments. Polar bears aren’t my area, and I haven’t read the original, so I won’t comment on the ursine substance. However, Amstrup et al. reinforce many of my earlier objections to (mis)application of forecasting principles, so here are some excerpts:

The Principles of Forecasting and Their Use in Science

… AGS based their audit on the idea that comparison to their self-described principles of forecasting could produce a valid critique of scientific results. AGS (p. 383) claimed their principles ‘summarize all useful knowledge about forecasting.’ Anyone can claim to have a set of principles, and then criticize others for violating their principles. However, it takes more than a claim to create principles that are meaningful or useful. In concluding our rejoinder, we point out that the principles espoused by AGS are so deeply flawed that they provide no reliable basis for a rational critique or audit.

Failures of the Principles

Armstrong (2001) described 139 principles and the support for them. AGS (pp. 382’“383) claimed that these principles are evidence based and scientific. They fail, however, to be evidence based or scientific on three main grounds: They use relative terms as if they were absolute, they lack theoretical and empirical support, and they do not follow the logical structure that scientific criticisms require.

Using Relative Terms as Absolute

Many of the 139 principles describe properties that models, methods, and (or) data should include. For example, the principles state that data sources should be diverse, methods should be simple, approaches should be complex, representations should be realistic, data should be reliable, measurement error should be low, explanations should be clear, etc. … However, it is impossible to look at a model, a method, or a datum and decide whether its properties meet or violate the principles because the properties of these principles are inherently relative.

Consider diverse. AGS faulted H6 for allegedly failing to use diverse sources of data. However, H6 used at least six different sources of data (mark-recapture data, radio telemetry data, data from the United States and Canada, satellite data, and oceanographic data). Is this a diverse set of data? It is more diverse than it would have been if some of the data had not been used. It is less diverse than it would have been if some (hypothetical) additional source of data had been included. To criticize it as not being diverse, however, without providing some measure of comparison, is meaningless.

Consider simple. What is simple? Although it might be possible to decide which of two models is simpler (although even this might not be easy), it is impossible’”in principle’”to say whether any model considered in isolation is simple or not. For example, H6 included a deterministic time-invariant population model. Is this model simple? It is certainly simpler than the stationary, stochastic model, or the nonstationary stochastic model also included in H6. However, without a measure of comparison, it is impossible to say which, if any, are ‘simple.’ For AGS to criticize the report as failing to use simple models is meaningless.

A Lack of Theoretical and Empirical Support

If the principles of forecasting are to serve as a basis for auditing the conclusions of scientific studies, they must have strong theoretical and (or) empirical support. Otherwise, how do we know that these principles are necessary for successful forecasts? Closer examination shows that although Armstrong (2001, p. 680) refers to evidence and AGS (pp. 382’“383) call the principles evidence based, almost half (63 of 139) are supported only by received wisdom or common sense, with no additional empirical or theoretical support. …

Armstrong (2001, p. 680) defines received wisdom as when ‘the vast majority of experts agree,’ and common sense as when ‘it is difficult to imagine that things could be otherwise.’ In other words, nearly half of the principles are supported only by opinions, beliefs, and imagination about the way that forecasting should be done. This is not evidence based; therefore, it is inadequate as a basis for auditing scientific studies. … Even Armstrong’s (2001) own list includes at least three cases of principles that are supported by what he calls strong empirical evidence that ‘refutes received wisdom’’”that is, at least three of the principles contradict received wisdom. …

Forecasting Audits Are Not Scientific Criticism

The AGS audit failed to distinguish between scientific forecasts and nonscientific forecasts. Scientific forecasts, because of their theoretical basis and logical structure based upon the concept of hypothesis testing, are almost always projections. That is, they have the logical form of ‘if X happens, then Y will follow.’ The analyses in AMD and H6 take exactly this form. A scientific criticism of such a forecast must show that even if X holds, Y does not, or need not, follow.

In contrast, the AGS audit simply scored violations of self-defined principles without showing how the identified violation might affect the projected result. For example, the accusation that H6 violated the commandment to use simple models is not a scientific criticism, because it says nothing about the relative simplicity of the model with respect to other possible choices. It also says nothing about whether the supposedly nonsimple model in question is in error. A scientific critique on the grounds of simplicity would have to identify a complexity in the model, and show that the complexity cannot be defended scientifically, that the complexity undermines the credibility of the model, and that a simpler model can resolve the issue. AGS did none of these.

There’s some irony to all this. Armstrong & Green criticize climate predictions as mere opinions cast in overly-complex mathematical terms, lacking predictive skill. The instrument of their critique is a complex set of principles, mostly derived from opinions, with undemonstrated ability to predict the skill of models and forecasts.