Fit to data, good or evil?

The following is another extended excerpt from Jim Thompson and Jim Hines’ work on financial guarantee programs. The motivation was a client request for comparison of modeling results to data. The report pushes back a little, explaining some important limitations of model-data comparisons (though it ultimately also fulfills the request). I have a slightly different perspective, which I’ll try to indicate with some comments, but on the whole I find this to be an insightful and provocative essay.

First and Foremost, we do not want to give credence to the erroneous belief that good models match historical time series and bad models don’t. Second, we do not want to over-emphasize the importance of modeling to the process which we have undertaken, nor to imply that modeling is an end-product.

In this report we indicate why a good match between simulated and historical time series is not always important or interesting and how it can be misleading Note we are talking about comparing model output and historical time series. We do not address the separate issue of the use of data in creating computer model. In fact, we made heavy use of data in constructing our model and interpreting the output — including first hand experience, interviews, written descriptions, and time series.

This is a key point. Models that don’t report fit to data are often accused of not using any. In fact, fit to numerical data is only one of a number of tests of model quality that can be performed. Alone, it’s rather weak. In a consulting engagement, I once ran across a marketing science model that yielded a spectacular fit of sales volume against data, given advertising, price, holidays, and other inputs – R^2 of .95 or so. It turns out that the model was a linear regression, with a “seasonality” parameter for every week. Because there were only 3 years of data, those 52 parameters were largely responsible for the good fit (R^2 fell to < .7 if they were omitted). The underlying model was a linear regression that failed all kinds of reality checks.

The model is part of a process

We have tried to construct logical arguments about endogenous processes that might cause future financial guarantee programs to go awry. A system dynamics process led to those arguments. Modeling is only part of that process. The process includes these steps

1. Describing the problem as a behavior pattern over time — which leads to a more dynamic view of the problem, and a view that is amenable to the subsequent steps

2. Creating theories of the problem in terms of loopsets — each of which contains a complete chain of causality (i.e. the cause of everything in the loop is contained in the loop). Loops are the cornerstone of system dynamics; they are capable of creating behavior patterns all by themselves, nothing else in the world can do so.

3. Linking the loopsets together — which gives a larger picture of the problem, preventing “myopia”, and which helps one understand how the theories tie together structurally.

4. Building computer model(s) — which causes one to be more specific about the structure of the theory.

5. Using the computer model to further explore the argument.

6. Articulating the theory in terms that an audience can understand.

Modeling is useful, even critical. But modeling is only part of the process and the other parts are critical, too. Furthermore, the model itself is an intermediate product in the larger process. The end-product of the process is a theory or logical argument about the real world.

In most of my work, I’d characterize the endpoint a little differently. It’s nice to have the theory, but the real end-product for me is a decision. The decision might be a one-off, or (more interestingly) a control strategy for achieving some desirable outcome in the distant future. In that case, I think the long-term role of data becomes more important, because it’s useful to have some way to track the outcome of the decision and decide if the theory was right. However, it would still be a bad tradeoff to use an open loop model to accommodate data.

Matching model output and historical time series is not always a useful test

Many people mistakenly believe that a good model will match historical time series from the real world and that a bad model will not. In fact, comparing model output to history cannot in the general case distinguish good models from bad models.

The mistaken belief that models should match history is based on certain specific classes of models or uses of models where such a match is important For example, the correspondence between model output and historical time series will obviously be important for models used to investigate the reasonableness of the historical time series. Further, a large class of models are those whose output is taken as a prediction of the future. Here, someone wants someone else to believe the output of the model. Comparing model output to historical time series in effect puts a character witness on the stand: The model has not lied in the past. It is important to realize, however, that the importance of a match to history in these cases is quite specific to the particular needs of these classes of modeling effort.

Our use of models falls outside these classes: We have not used models either for prediction or to support our arguments and have not asked anybody to believe our models. We are not even sure what it would mean to believe our models — any more than we know what it would mean to believe our pencils or any other tool we used. Our use of models has had little to do with belief and a lot to do with argument creation. We used our models as aids in the design of our arguments. It is the argument that is important, not one of the tools that was useful in producing it.

In practice, I think there’s a lot to this statement. I interpret “the argument” as “mental models” and there’s certainly lots of evidence that formal models won’t change people’s behavior unless their mental models are improved in the process.

However, I’m a little uncomfortable with the idea of disposable models as an ideal. I think that’s very domain dependent. For most decisions, including day-to-day life, it’s not really practical to use formal models. But there are some big decisions, like what to do about climate change, that are intractably complex and resist “withering away of the model.” Arguments about climate tend to be rather unproductive, because it’s too easy to miss unstated auxiliary assumptions that differ. There’s no doubt that we’d probably be better off if better mental models were widespread, but I think formal models will remain an essential part of the process. Hopefully, making formal models more accessible will be the route to improved mental models.

More specifically, we use our models as analogies. Analogies are common and powerful ways of obtaining insight, but it is the insight that one wants to test, not the analogy.

Consider an example: A reporter for, say, Fortune Magazine might be asked to write something insightful about entrepreneurs. The reporter might hit on the idea that the relationship between the businessman and his business is like a marriage. In this case the model is the marriage and the real world is the entrepreneurial endeavor. The reporter would first think about his model: Marriages have their ups and downs and require perseverance. The next step is to ask whether ups, downs, and perseverance also characterize the real world of entrepreneurship. The answer is probably yes. But, perhaps the reporter knew that already, and so far the model has not yielded very much. Thinking further, the reporter might consider that marriages often produce children. This jogs his thinking: Perhaps the business itself is the “child” of the entrepreneur. In this case a host of possibilities arise: Perhaps entrepreneurs respond to threats to the business in a highly emotional way. This gets interesting: perhaps, the founder of a business will fail to act logically at key times when his business is threatened.

We can use the example to consider the relationship between models (here, the marriage) and arguments (here, entrepreneurs are emotionally tied to their business, and, as a consequence, they may tail to act logically at key times):

1. The model helped produce the argument or theory. But, the argument stands or falls on the basis of its own logic and whether it is consistent with information about entrepreneurs. The argument is disconnected from the model. In particular.

2. A fit between the model and the real world is not evidence in favor of the argument: The fact that parents love their children is not evidence that an entrepreneur will react emotionally to threats to his business. And,

3. A lack of fit does not invalidate the argument: Grandparents often care for children when parents go on vacation, but do not typically care for a business when entrepreneurs go on vacation. Here, the model does not fit the real world. But the lack of fit is irrelevant and does not invalidate the reporter’s argument.

Our use of computer models can be seen as a process of constructing an analogy of the real world and then using that analogy to log our thinking along. A fit between the analogy and the real world is not the issue. Analogies — whether computer based or not — are useful if they lead to better understanding and are not useful it they don’t.

For me, the key point here is that a model that doesn’t explain everything in the world is often still be useful. One would hope that were true in principle, since no model explains everything about the world. The boundaries of usefulness extend much further into the territory of poor-fit or data-free exploration than commonly assumed. However, I’m not sure I buy the idea that arguments are different from models. I think of arguments merely as special cases of models, usually at a lower level of formalism. That makes them less reliable, but also sometimes easier to construct, share, and use. Validating an argument is the same as validating a model: you confront it with as many tests as you can: conformance to rules of logic, laws of physics, behavior in extreme conditions, Occam’s razor, and fit to data, and see if it breaks. If it doesn’t, the likelihood that the argument or model is correct is proportional to how hard you tried (how many alternative hypotheses you explored). Lack of fit doesn’t necessarily invalidate the model or argument, but sometimes it does.

The Problem of Missing Structure

A question might arise whether one it would be useful to examine a run which started from conditions similar to those of a particular year and which used exogenous inputs that mirror real-world disturbances. The answer is that one would still not have useful test.

Our model — constructed to explore financial arguments — lacks structure that is also moving the data around. Consequently, the one path the model output should not follow precisely is the historical path.

We could still tune the model to match the data quite precisely. To do so we would need to choose parameters that would make the structure we do include mimic the behavior of structure that we do not includes. This would not tell us much about our model or the parameters. Tuning our model to the data would not make our model more useful, nor our arguments truer.

This is perhaps the most crucial point. I’ve just run across a great illustration of a model, derived from first principles, then improperly calibrated to an incomplete dataset, yielding results that fit history but mislead about the future (while the original first principles version was truer). I hope to write that up in the next few days – stay tuned.

Ultimately, I’d argue that a model incorporating numerical data and other information is better than one using only other information, all else equal. However, all else is not equal. Incorporating numerical data comes at the cost of added structure, collection effort, computational slowdown, loss of transparency, and greater difficulty performing key experiments like disturbances from equilibrium. That cost can be very high, partly because so much data is rubbish (perhaps because the only way to discover its faults is to tune a dynamic model to it, which few bother to do). There can also be a big payoff, when you discover new features of the data, rule out theories that seemed plausible from first principles, or create a model that works as an embedded system to transform data into information. Sometimes the stakes are high enough to make it data-intensive modeling worthwhile, and sometimes they aren’t.

Again, thanks for Jim Thompson for scanning the originals. Apologies for typos – the OCR tends to mix up l, t and f a fol.

Leave a Reply

Your email address will not be published. Required fields are marked *