The Mysterious TREND Function

This post is literally trending.

Most System Dynamics software includes a pair of TREND and FORECAST functions. For historic reasons, these are typically the simplest possible first-order structure, which is fine for continuous, deterministic models, but not the best for applications with noise or real data. The waters are further muddied by the fact that Excel has a TREND function that’s really FORECASTing, plus newer FORECAST functions with methods that may differ from typical SD practice. Business Dynamics describes a third-order TREND function that’s considerably better for real-world applications.

As a result of all this variety, I think trend measurement and forecasting remain unnecessarily mysterious, so I built the model below to compare several approaches.

Goal

The point of TREND and FORECAST functions is to model the formation of expectations in a way that closely matches what people in the model are really doing.

This could mean a wide variety of things. In many cases, people aren’t devoting formal thought to observing and predicting the phenomenon of interest. In that case, adaptive expectations may be a good model. The implementation in SD is the SMOOTH function. Using a SMOOTH to set expectations says that people expect the future to be like the past, and they perceive changes in conditions only gradually. This is great if the forecasted variable is in fact stationary, or at least if changes are slow compared to the perception time. On the other hand, for a fast-evolving situation like COVID19, delay can be fatal – literally.

For anything that is in fact changing (or that people perceive to be changing), it makes sense to project changes into the future with some kind of model. For a tiny fraction of reality, that might mean a sophisticated model: multiple regression, machine learning, or some kind of calibrated causal model, for example. However, most things are not subject to that kind of sophisticated scrutiny. Instead, expectations are likely to be formed by some kind of simple extrapolation of past trends into the future.

In some cases, things that are seemingly modeled in a sophisticated way may wind up looking a lot like extrapolation, due to human nature. The forecasters form a priori expectations of what “good” model projections look like, based on fairly naive adaptive-extrapolative expectations and social processes, and use those expectations to filter the results that are deemed acceptable. This makes the sophisticated results look a lot like extrapolation. However, the better the model, the harder it is for this to happen.

More Oil Price Forecasts

The goal, by the way, is generally not to use trend-like functions to make a forecast. Extrapolation may be perfectly reasonable in some cases, particularly where you don’t care too much about the outcome. But generally, you’re better off with a more sophisticated model – the whole point of SD and other methods is to address the feedback and nonlinearities that make extrapolation and other simpleminded methods go wrong. On the other hand, simple extrapolation may be great for creating a naive or null forecast to use as a benchmark for comparison with better approaches.

Basics

So, let’s suppose you want to model the expectations for something that people perceive to be (potentially) steadily increasing or decreasing. You can visit St. Louis FRED and find lots of economic series like this – GDP, prices, etc. Here’s the spot price of West Texas Intermediate crude oil:

Given this data, there are immediately lots of choices. Thinking about someone today making an investment conditional on future oil prices, should they extrapolate linearly (black and green lines) or exponentially (red line)? Should they use the whole series (black and red) or just the last few years (green)? Each of these implies a different forecast for the future.

Suppose we have some ideas about the forecast horizon, desired sensitivity to noise, etc. How do we actually establish a trend? One option is linear regression, which is just a formal way of eyeballing a straight line that fits some data. It works well, but has some drawbacks. First, it assigns equal weight to all the data throughout the interval, and zero weight to anything outside the interval. That may be a poor model for perceptual processes, where the most recent data has the greatest salience to the decision maker. Second, it’s computation- and storage-intensive: you have to do a lot of math, and keep track of every data point within the window of interest. That’s fine if it resides in a spreadsheet, but not if it resides in someone’s head.

Linear fit to a subset of the WTI spot price data.

The trend-like functions make an elegant simplification that addresses the drawbacks of regression. It’s based on the following observation:*

If, as above, you take a growing input (red line) and smooth it exponentially (using the SMOOTH function, or an equivalent first order goal-gap structure), you get the blue line: another ramp, that lags the input with a delay equal to the smoothing time. This means that, at month 400, we know two points: the current value of the input, and the current value of the smoothed input. But the smoothed value represents the past value of the input, in this case 60 months previous. So, we can use these two points to determine the slope of the red line:

(1) slope = (current - smoothed) / smoothing time

This is the slope in terms of input units per time. It’s often convenient to compute the fractional slope instead, expressing the growth as a fractional increase in the input per unit time:

(2) fractional slope = (current - smoothed) / smoothed / smoothing time

This is what the simple TREND functions in SD software typically report. Note that it blows up if the smoothed quantity reaches 0, while the linear method (1) does not.

If we think the growth is exponential, rather than a linear ramp, we can compute the growth rate in continuous time:

(3) fractional growth rate = LN( current / smoothed ) / smoothing time

This has pros and cons. Obviously, if a quantity is really growing exponentially, it should be measured that way. But if we’re modeling how people actually think, they may extrapolate linearly when the underlying behavior is exponential, thereby greatly underestimating future growth. Note that the very idea of forecasting exponentially assumes that the values involved are positive.

Once you know the slope of the (estimated) line, you can extrapolate it into the future via a method that corresponds with the measurement:

(1b) future value = current + slope * forecast horizon
(2b) future value = current * (1 + fractional slope * forecast horizon)
(3b) future value = current * EXP( fractional growth rate * forecast horizon )

Typical FORECAST functions use (2b).

*There’s a nice discussion of this in Appendix L of Industrial Dynamics, around figure L-3.

Refinements

The strategy above has the virtue of great simplicity: you only need to keep track of one extra stock, and the computation needed to extrapolate is minimal. It works great for continuous models. Unfortunately, it’s not very resistant to noise and discontinuities. Consider what happens if the input is not a smooth line, but a set of noisy points scattered around the line:

The SMOOTH function filters the data, so the past point (blue) may still be pretty close to the underlying input trend (red line). However, the extrapolation (orange line) relies only on the past point and the single current point. Any noise or discontinuity in the current point therefore can dramatically influence the slope estimate and future projections. This is not good.

Similar perverse behaviors happen if the input is a pulse or step function. For example:

Fortunately, simple functions can be saved. In Expectation Formation in Behavioral Simulation Models, John Sterman describes an alternative third-order TREND function that improves robustness and realism. The same structure can be found in the excellent discussion of expectations in Business Dynamics, Chapter 16.

I’ll leave the details to the article, but the basic procedure is:

  • Recognize that the input is not perceived instantaneously, but only after some delay (represented by smoothing). This might capture the fact that formal accounting procedures only report results with a lag, or that you only see the price of cheese at the supermarket intermittently.
  • Track a historic point (the Reference Condition), by smoothing, as in the simpler methods.
  • Measure the Indicated Trend as the fractional slope between the Perceived Present Condition and the Reference Condition.
  • Smooth the Indicated Trend again to form the final Perceived Trend. The smoothing prevents abrupt changes in the indicated trend from causing dramatic overshoots or undershoots in the trend estimate and extrapolations that use it.

There’s an intermediate case that’s actually what I’m most likely to reach for when I need something like this: second-order smoothing. There are actually several very similar approaches (see double exponential smoothing for example) in the statistical literature. You have to be a little cautious, because these are often expressed in discrete time and therefore require a little thought to adapt to continuous time and/or unequal data intervals.

The version I use does the following:

(4) smoothed input = SMOOTH( input, smoothing time )

(5) linear trend = (input-smoothed input) / smoothing time
(6) smoothed trend = SMOOTH( linear trend, trend smoothing time )
(7) forecast = smoothed input + smoothed trend*(smoothing time + forecast horizon)

This provides most of what you want in a simple extrapolation method. It largely ignores a PULSE disturbance. Overshoot is mild when presented with a STEP input (as long as the smoothing times are long enough). It largely rejects noise, but still tracks a real RAMP accurately.

Back to regression

SD models typically avoid linear regression, for a reasons that are partly legitimate (as mentioned above). But it’s also partly cultural, as a reaction to incredibly stupid regressions that passed for models in other fields around the time of SD’s inception. We shouldn’t throw the baby out with that bathwater.

Fortunately, while most software doesn’t make linear regression particularly accessible, it turns out to be easy to implement an online regression algorithm with stocks and flows with no storage of data vectors required. The basic insight is that the regression slope (typically denoted beta) is given by:

(8) slope = covar(x,y) / var(x)

where x is time and y is the input to be forecasted. But var() and covar() are just sums of squares and cross products. If we’re OK with having exponential weighting of the regression, favoring more recent data, we can track these as moving sums (analogous to SMOOTHs). As a further simplification, as long as the smoothing window is not changing, we can compute var(x) directly from the smoothing window, so we only need to track the mean and covariance, yielding another second-order smoothing approach.

If the real decision makers inspiring your model are actually using linear regression, this may be a useful way to implement it. The implementation can be extended to equal weighting over a finite interval if needed. I find the second-order smoothing approach more intuitive, and it performs just as well, so I tend to prefer that in most cases.

Extensions

Most of what I’ve described above is linear, i.e. it assumes linear growth or decline of the quantity of interest. For a lot of things, exponential growth will be a better representation. Equations (3) and (3b) assume that, but any of the other methods can be adapted to assume exponential behavior by operating on the logarithm of the input, and then inverting that with exp(…) to form the final output.

All the models described here share one weakness: cyclical inputs.

When presented with a sin wave, the simplest approach – smoothing – just bulldozes through. The higher the frequency, the less of the signal passes into the forecast. The TREND function can follow a wave if the period is longer than the smoothing time. If the dynamics are faster, it starts to miss the turning points and overshoot dramatically. The higher-order methods are better, but still not really satisfactory. The bottom line is that your projection method must use a model capable of representing the signal, and none of the methods above embodies anything about cyclical behavior.

There are lots of statistical approaches to detection of seasonality, which you can google. Many involve binning techniques, similar to those described in Appendix N of Industrial Dynamics, Self Generated Seasonal Cycles.

The model

The Vensim model, with changes (.cin) files implementing some different experiments:

trendy 4b.zip

I developed this in DSS and forgot to test PLE before uploading. If you have issues, please comment and I’ll adapt it to work.

A Ventity version is on the way.

EIA projections – peak oil or snake oil?

Econbrowser has a nice post from Steven Kopits, documenting big changes in EIA oil forecasts. This graphic summarizes what’s happened:

kopits_eia_forecasts_jun_10
Click through for the original article.

As recently as 2007, the EIA saw a rosy future of oil supplies increasing with demand. It predicted oil consumption would rise by 15 mbpd to 2020, an ample amount to cover most eventualities. By 2030, the oil supply would reach nearly 118 mbpd, or 23 mbpd more than in 2006. But over time, this optimism has faded, with each succeeding year forecast lower than the year before. For 2030, the oil supply forecast has declined by 14 mbpd in only the last three years. This drop is as much as the combined output of Saudi Arabia and China.

In its forecast, the EIA, normally the cheerleader for production growth, has become amongst the most pessimistic forecasters around. For example, its forecasts to 2020 are 2-3 mbpd lower than that of traditionally dour Total, the French oil major. And they are below our own forecasts at Douglas-Westwood through 2020. As we are normally considered to be in the peak oil camp, the EIA’s forecast is nothing short of remarkable, and grim.

Is it right? In the last decade or so, the EIA’s forecast has inevitably proved too rosy by a margin. While SEC-approved prospectuses still routinely cite the EIA, those who deal with oil forecasts on a daily basis have come to discount the EIA as simply unreliable and inappropriate as a basis for investments or decision-making. But the EIA appears to have drawn a line in the sand with its new IEO and placed its fortunes firmly with the peak oil crowd. At least to 2020.

Since production is still rising, I think you’d have to call this “inflection point oil,” but as a commenter points out, it does imply peak conventional oil:

It’s also worth note that most of the liquids production increase from now to 2020 is projected to be unconventional in the IEO. Most of this is biofuels and oil sands. They REALLY ARE projecting flat oil production.

Since I’d looked at earlier AEO projections in the past, I wondered what early IEO projections looked like. Unfortunately I don’t have time to replicate the chart above and overlay the earlier projections, but here’s the 1995 projection:

Oil - IEO 1995

The 1995 projections put 2010 oil consumption at 87 to 95 million barrels per day. That’s a bit high, but not terribly inconsistent with reality and the new predictions (especially if the financial bubble hadn’t burst). Consumption growth is 1.5%/year.

And here’s 2002:

Oil - IEO 2002

In the 2002 projection, consumption is at 96 million barrels in 2010 and 119 million barrels in 2020 (waaay above reality and the 2007-2010 projections), a 2.2%/year growth rate.

I haven’t looked at all the interim versions, but somewhere along the way a lot of optimism crept in (and recently, crept out). In 2002 the IEO oil trajectory was generated by a model called WEPS, so I downloaded WEPS2002 to take a look. Unfortunately, it’s a typical open-loop spreadsheet horror show. My enthusiasm for a detailed audit is low, but it looks like oil demand is purely a function of GDP extrapolation and GDP-energy relationships, with no hint of supply-side dynamics (not even prices, unless they emerge from other models in a sneakernet portfolio approach). There’s no evidence of resources, not even synchronized drilling. No wonder users came to “discount the EIA as simply unreliable and inappropriate as a basis for investments or decision-making.”

Newer projections come from a new version, WEPS+. Hopefully it’s more internally consistent than the 2002 spreadsheet, and it does capture stock/flow dynamics and even includes resources. EIA appears to be getting better. But it appears that there’s still a fundamental problem with the paradigm: too much detail. There just isn’t any point in producing projections for dozens of countries, sectors and commodities two decades out, when uncertainty about basic dynamics renders the detail meaningless. It would be far better to work with simple models, capable of exploring the implications of structural uncertainty, in particular relaxing assumptions of equilibrium and idealized behavior.

Update: Michael Levi at the CFR blog points out that much of the difference in recent forecasts can be attributed to changes in GDP projections. Perhaps so. But I think this reinforces my point about detail, uncertainty, and transparency. If the model structure is basically consumption = f(GDP, price, elasticity) and those inputs have high variance, what’s the point of all that detail? It seems to me that the detail merely obscures the fundamentals of what’s going on, which is why there’s no simple discussion of reasons for the change in forecast.

Another Look at Limits to Growth

I was just trying to decide whether I believed what I said recently, that the current economic crisis is difficult to attribute to environmental unsustainability. While I was pondering, I ran across this article by Graham Turner on the LtG wiki entry, which formally compares the original Limits runs to history over the last 30+ years. A sample:

Industrial output in Limits to Growth runs vs. history

The report basically finds what I’ve argued before: that history does not discredit Limits.

More Oil Price Forecasts

The history of long term energy forecasting is a rather mixed bag. Supply and demand forecasts have generally been half decent, in terms of percent error, but that’s primarily because GDP growth is steady, energy intensity is price-inelastic, and there’s a lot of momentum in energy consuming and producing capital. Energy price forecasts, on the other hand, have generally been terrible. Consider the Delphi panel forecasts conducted by the CEC:

California Energy Commission Delphi Forecasts

In 1988, John Sterman showed that energy forecasts, even those using sophisticated models, were well represented by a simple adaptive rule: Continue reading “More Oil Price Forecasts”

SRES – We've got a bigger problem now

Recently Pielke, Wigley and Green discussed the implications of autonomous energy efficiency improvements (AEEI) in IPCC scenarios, provoking many replies. Some found the hubbub around the issue surprising, because the assumptions concerned were well known, at least to modelers. I was among the surprised, but sometimes the obvious needs to be restated loud and clear. I believe that there are several bigger elephants in the room that deserve such treatment. AEEI is important, as are other hotly debated SRES choices like PPP vs. MEX, but at the end of the day, these are just parameter choices. In complex systems parameter uncertainty generally plays second fiddle to structural uncertainty. Integrated assessment models (IAMs) as a group frequently employ similar methods, e.g., dynamic general equilibrium, and leave crucial structural assumptions untested. I find it strange that the hottest debates surround biogeophysical models, which are actually much better grounded in physical principles, when socio-economic modeling is so uncertain.

Continue reading “SRES – We've got a bigger problem now”

Take the bet, Al

I’ve asserted here that the Global Warming Challenge is a sucker bet. I still think that’s true, but I may be wrong about the identity of the sucker. Here are the terms of the bet as of this writing:

The general objective of the challenge is to promote the proper use of science in formulating public policy. This involves such things as full disclosure of forecasting methods and data, and the proper testing of alternative methods. A specific objective is to develop useful methods to forecast global temperatures. Hopefully other competitors would join to show the value of their forecasting methods. These are objectives that we share and they can be achieved no matter who wins the challenge.

Al Gore is invited to select any currently available fully disclosed climate model to produce the forecasts (without human adjustments to the model’s forecasts). Scott Armstrong’s forecasts will be based on the naive (no-change) model; that is, for each of the ten years of the challenge, he will use the most recent year’s average temperature at each station as the forecast for each of the years in the future. The naïve model is a commonly used benchmark in assessing forecasting methods and it is a strong competitor when uncertainty is high or when improper forecasting methods have been used.

Specifically, the challenge will involve making forecasts for ten weather stations that are reliable and geographically dispersed. An independent panel composed of experts agreeable to both parties will designate the weather stations. Data from these sites will be listed on a public web site along with daily temperature readings and, when available, error scores for each contestant.

Starting at the beginning of 2008, one-year ahead forecasts then two-year ahead forecasts, and so on up to ten-year-ahead forecasts of annual ‘mean temperature’ will be made annually for each weather station for each of the next ten years. Forecasts must be submitted by the end of the first working day in January. Each calendar year would end on December 31.

The criteria for accuracy would be the average absolute forecast error at each weather station. Averages across stations would be made for each forecast horizon (e.g., for a six-year ahead forecast). Finally, simple unweighted averages will be made of the forecast errors across all forecast horizons. For example, the average across the two-year ahead forecast errors would receive the same weight as that across the nine-year-ahead forecast errors. This unweighted average would be used as the criterion for determining the winner.

I previously noted several problems with the bet:

The Global Warming Challenge is indeed a sucker bet, with terms slanted to favor the naive forecast. It focuses on temperature at just 10 specific stations over only 10 years, thus exploiting the facts that (a) GCMs do not have local resolution (their grids are typically several degrees) (b) GCMs, unlike weather models, do not have infrastructure for realtime updating of forcings and initial conditions (c) ten stations is a pathetically small sample, and thus a low signal-to-noise ratio is expected under any circumstances (d) the decadal trend in global temperature is small compared to natural variability.

It’s actually worse than I initially thought. I assumed that Armstrong would determine the absolute error of the average across the 10 stations, rather than the average of the individual absolute errors. By the triangle inequality, the latter is always greater than or equal to the former, so this approach further worsens the signal-to-noise ratio and enhances the advantage of the naive forecast. In effect, the bet is 10 replications of a single-station test. But wait, there’s still more: the procedure involves simple, unweighted averages of errors across all horizons. But there will be only one 10-year forecast, two 9-year forecasts … , and ten 1-year forecasts. If the temperature and forecast are stationary, the errors at various horizons have the same magnitude, and the weighted average horizon is only four years. Even with other plausible assumptions, the average horizon of the experiment is much less than 10 years, further reducing the value of an accurate long-term climate model.

However, there is a silver lining. I have determined, by playing with the GHCN data, that Armstrong’s procedure can be reliably beaten by a simple extension of a physical climate model published a number of years ago. I’m busy and I have a high discount rate, so I will happily sell this procedure to the best reasonable offer (remember, you stand to make $10,000).

Update: I’m serious about this, by the way. It can be beaten.

More on Climate Predictions

No pun intended.

Scott Armstrong has again asserted on the JDM list that global warming forecasts are merely unscientific opinions (ignoring my prior objections to the claim). My response follows (a bit enhanced here, e.g., providing links).


Today would be an auspicious day to declare the death of climate science, but I’m afraid the announcement would be premature.

JDM researchers might be interested in the forecasts of global warming as they are based on unaided subjective forecasts (unaided by forecasting principles) entered into complex computer models.

This seems to say that climate scientists first form an opinion about the temperature in 2100, or perhaps about climate sensitivity to 2x CO2, then tweak their models to reproduce the desired result. This is a misperception about models and modeling. First, in a complex physical model, there is no direct way for opinions that represent outcomes (like climate sensitivity) to be “entered in.” Outcomes emerge from the specification and calibration process. In a complex, nonlinear, stochastic model it is rather difficult to get a desired behavior, particularly when the model must conform to data. Climate models are not just replicating the time series of global temperature; they first must replicate geographic and seasonal patterns of temperature and precipitation, vertical structure of the atmosphere, etc. With a model that takes hours or weeks to execute, it’s simply not practical to bend the results to reflect preconceived notions. Second, not all models are big and complex. Low order energy balance models can be fully estimated from data, and still yield nonzero climate sensitivity.

I presume that the backing for the statement above is to be found in Green and Armstrong (2007), on which I have already commented here and on the JDM list. Continue reading “More on Climate Predictions”

On Limits to Growth

It’s a good idea to read things you criticize; checking your sources doesn’t hurt either. One of the most frequent targets of uninformed criticism, passed down from teacher to student with nary a reference to the actual text, must be The Limits to Growth. In writing my recent review of Green & Armstrong (2007), I ran across this tidbit:

Complex models (those involving nonlinearities and interactions) harm accuracy because their errors multiply. Ascher (1978), refers to the Club of Rome’s 1972 forecasts where, unaware of the research on forecasting, the developers proudly proclaimed, “in our model about 100,000 relationships are stored in the computer.” (page 999)

Setting aside the erroneous attributions about complexity, I found the statement that the MIT world models contained 100,000 relationships surprising, as both can be diagrammed on a single large page. I looked up electronic copies of World Dynamics and World3, which have 123 and 373 equations respectively. A third or more of those are inconsequential coefficients or switches for policy experiments. So how did Ascher, or Ascher’s source, get to 100,000? Perhaps by multiplying by the number of time steps over the 200 year simulation period – hardly a relevant measure of complexity.

Meadows et al. tried to steer the reader away from focusing on point forecasts. The introduction to the simulation results reads,

Each of these variables is plotted on a different vertical scale. We have deliberately omitted the vertical scales and we have made the horizontal time scale somewhat vague because we want to emphasize the general behavior modes of these computer outputs, not the numerical values, which are only approximately known. (page 123)

Many critics have blithely ignored such admonitions, and other comments to the effect of, “this is a choice, not a forecast” or “more study is needed.” Often, critics don’t even refer to the World3 runs, which are inconvenient in that none reaches overshoot in the 20th century, making it hard to establish that “LTG predicted the end of the world in year XXXX, and it didn’t happen.” Instead, critics choose the year XXXX from a table of resource lifetime indices in the chapter on nonrenewable resources (page 56), which were not forecasts at all. Continue reading “On Limits to Growth”

Evidence on Climate Predictions

Last Year, Kesten Green and Scott Armstrong published a critique of climate science, arguing that there are no valid scientific forecasts of climate. RealClimate mocked the paper, but didn’t really refute it. The paper came to my attention recently when Green & Armstrong attacked John Sterman and Linda Booth Sweeney’s paper on mental models of climate change.

I reviewed Green & Armstrong’s paper and concluded that their claims were overstated. I responded as follows: Continue reading “Evidence on Climate Predictions”