Don't just do something, stand there! Reflections on the counterintuitive behavior of complex systems, seen through the eyes of System Dynamics, Systems Thinking and simulation.
I sat down over lunch to develop a stock-flow diagram with my kids. This is what happens when you teach system dynamics to young boys:
Notice that there’s no outflow for the unpleasantries, because they couldn’t agree on whether the uptake mechanism was chemical reaction or physical transport.
Along the way, we made a process observation. We started off quiet, but gradually talked louder and louder until we were practically shouting at each other. The boys were quick to identify the dynamic:
Jay Forrester always advocates tackling the biggest problems, because they’re no harder to solve than trivial ones, but sometimes it’s refreshing to lighten up and take on systems of limited importance.
Other rodents also rebounded, turning to seabird chicks for food
An expensive pan-rodent eradication plan is now underway.
But this time, administrators are prepared to make course corrections if things do not turn out according to plan.“This study clearly demonstrates that when you’re doing a removal effort, you don’t know exactly what the outcome will be,” said Barry Rice, an invasive species specialist at the Nature Conservancy. “You can’t just go in and make a single surgical strike. Every kind of management you do is going to cause some damage.”
I was rereading the Fifth Discipline on the way to Boston the other way, and something got me started on this. Wrath, greed, sloth, pride, lust, envy, and gluttony are the downfall of individuals, but what about the downfall of systems? Here’s my list, in no particular order:
Information pollution. Sometimes known as lying, but also common in milder forms, such as greenwash. Example: twenty years ago, the “recycled” symbol was redefined to mean “recyclable” – a big dilution of meaning.
Elimination of diversity. Example: overconsolidation of industries (finance, telecom, …). As Jay Forrester reportedly said, “free trade is a mechanism for allowing all regions to reach all limits at once.”
Changing the top-level rules in pursuit of personal gain. Example: the Starpower game. As long as we pretend to want to maximize welfare in some broad sense, the system rules need to provide an equitable framework, within which individuals can pursue self-interest.
Certainty. Planning for it leads to fragile strategies. If you can’t imagine a way you could be wrong, you’re probably a fanatic.
Elimination of slack. Normally this is regarded as a form of optimization, but a system without any slack can’t change (except catastrophically). How are teachers supposed to improve their teaching when every minute is filled with requirements?
Superstition. Attribution of cause by correlation or coincidence, including misapplied pattern-matching.
The four horsemen from classic SD work on flawed mental models: linear, static, open-loop, laundry-list thinking.
That’s seven (cheating a little). But I think there are more candidates that don’t quite make the big time:
Impatience. Don’t just do something, stand there. Sometimes.
Failure to account for delays.
Abstention from top-level decision making (essentially not voting).
The very idea of compiling such a list only makes sense if we’re talking about the downfall of human systems, or systems managed for the benefit of “us” in some loose sense, but perhaps anthropocentrism is a sin in itself.
I’m sure others can think of more! I’d be interested to hear about them in comments.
The following is another extended excerpt from Jim Thompson and Jim Hines’ work on financial guarantee programs. The motivation was a client request for comparison of modeling results to data. The report pushes back a little, explaining some important limitations of model-data comparisons (though it ultimately also fulfills the request). I have a slightly different perspective, which I’ll try to indicate with some comments, but on the whole I find this to be an insightful and provocative essay.
First and Foremost, we do not want to give credence to the erroneous belief that good models match historical time series and bad models don’t. Second, we do not want to over-emphasize the importance of modeling to the process which we have undertaken, nor to imply that modeling is an end-product.
In this report we indicate why a good match between simulated and historical time series is not always important or interesting and how it can be misleading Note we are talking about comparing model output and historical time series. We do not address the separate issue of the use of data in creating computer model. In fact, we made heavy use of data in constructing our model and interpreting the output — including first hand experience, interviews, written descriptions, and time series.
This is a key point. Models that don’t report fit to data are often accused of not using any. In fact, fit to numerical data is only one of a number of tests of model quality that can be performed. Alone, it’s rather weak. In a consulting engagement, I once ran across a marketing science model that yielded a spectacular fit of sales volume against data, given advertising, price, holidays, and other inputs – R^2 of .95 or so. It turns out that the model was a linear regression, with a “seasonality” parameter for every week. Because there were only 3 years of data, those 52 parameters were largely responsible for the good fit (R^2 fell to < .7 if they were omitted). The underlying model was a linear regression that failed all kinds of reality checks.
Ever since the housing market fell apart, I’ve been meaning to write about some excellent work on federal financial guarantee programs, by colleagues Jim Hines (of TUI fame) and Jim Thompson.
Designing Programs that Work.
This document is part of a series reporting on a study of tederal financial guarantee programs. The study is concerned with how to design future guarantee programs so that they will be more robust, less prone to problems. Our focus has been on internal (that is. endogenous) weaknesses that might inadvertently be designed into new programs. Such weaknesses may be described in terms of causal loops. Consequently, the study is concerned with (a) identifying the causal loops that can give rise to problematic behavior patterns over time, and (b) considering how those loops might be better controlled.
Their research dates back to 1993, when I was a naive first-year PhD student, but it’s not a bit dated. Rather, it’s prescient. It considers a series of design issues that arise with the creation of government-backed entities (GBEs). From today’s perspective, many of the features identified were the seeds of the current crisis. Jim^2 identify a number of structural innovations that control the undesirable behaviors of the system. It’s evident that many of these were not implemented, and from what I can see won’t be this time around either.
There’s a sophisticated model beneath all of this work, but the presentation is a nice example of a nontechnical narrative. The story, in text and pictures, is compelling because the modeling provided internal consistency and insights that would not have been available through debate or navel rumination alone.
I don’t have time to comment too deeply, so I’ll just provide some juicy excerpts, and you can read the report for details:
The profit-lending-default spiral
The situation described here is one in which an intended corrective process is weakened or reversed by an unintended self-reinforcing process. The corrective process is one in which inadequate profits are corrected by rising income on an increasing portfolio. The unintended self-reinforcing process is one in which inadequate profits are met with reduced credit standards which cause higher defaults and a further deterioration in profits. Because the fee and interest income lrom a loan begins to be received immediately, it may appear at first that the corrective process dominates, even if the self-reinforcing is actually dominant. Managers or regulators initially may be encouraged by the results of credit loosening and portfolio building, only to be surprised later by a rising tide of bad news.
As is typical, some well-intentioned policies that could mitigate the problem behavior have unpleasant side-effects. For example, adding risk-based premiums for guarantees worsens the short-term pressure on profits when standards erode, creating a positive loop that could further drive erosion.
There are lots of good reasons for building models without data. However, if you want to measure something (i.e. estimate model parameters), produce results that are closely calibrated to history, or drive your model with historical inputs, you need data. Most statistical modeling you’ll see involves static or dynamically simple models and well-behaved datasets: nice flat files with uniform time steps, units matching (or, alarmingly, ignored), and no missing points. Things are generally much messier with a system dynamics model, which typically has broad scope and (one would hope) lots of dynamics. The diversity of data needed to accompany a model presents several challenges:
disagreement among sources
missing data points
non-uniform time intervals
variable quality of measurements
diverse source formats (spreadsheets, text files, databases)
The mathematics for handling the technical estimation problems were developed by Fred Schweppe and others at MIT decades ago. David Peterson’s thesis lays out the details for SD-type models, and most of the functionality described is built into Vensim. It’s also possible, of course, to go a simpler route; even hand calibration is often effective and reasonably quick when coupled with Synthesim.
Either way, you have to get your data corralled first. For a simple model, I’ll build the data right into the dynamic model. But for complicated models, I usually don’t want the main model bogged down with units conversions and links to a zillion files. In that case, I first build a separate datamodel, which does all the integration and passes cleaned-up series to the main model as a fast binary file (an ordinary Vensim .vdf). In creating the data infrastructure, I try to maximize three things:
Replicability. Minimize the number of manual steps in the process by making the data model do everything. Connect the datamodel directly to primary sources, in formats as close as possible to the original. Automate multiple steps with command scripts. Never use hand calculations scribbled on a piece of paper, unless you’re scrupulous about lab notebooks, or note the details in equations’ documentation field.
Transparency. Often this means “don’t do complex calculations in spreadsheets.” Spreadsheets are very good at some things, like serving as a data container that gives good visibility. However, spreadsheet calculations are error-prone and hard to audit. So, I try to do everything, from units conversions to interpolation, in Vensim.
Quality.#1 and #2 already go a long way toward ensuring quality. However, it’s possible to go further. First, actually look at the data. Take time to build a panel of on-screen graphs so that problems are instantly visible. Use a statistics or visualization package to explore it. Lately, I’ve been going a step farther, by writing Reality Checks to automatically test for discontinuities and other undesirable properties of spliced time series. This works well when the data is simply to voluminous to check manually.
This can be quite a bit of work up front, but the payoff is large: less model rework later, easy updates, and higher quality. It’s also easier generate graphics or statistics that help others to gain confidence in the model, though it’s sometimes important to help them recognize that goodness of fit is a weak test of quality.
It’s good to build the data infrastructure before you start modeling, because that way your drivers and quality control checks are in place as you build structure, so you avoid the pitfalls of an end-of-pipe inspection process. A frequent finding in our corporate work has been that cherished data is in fact rubbish, or means something quite different that what users have historically assumed. Ventana colleague Bill Arthur argues that modern IT practices are making the situation worse, not better, because firms aren’t retaining data as long (perhaps a misplaced side effect of a mania for freshness).
We report experiments assessing people’s intuitive understanding of climate change. We presented highly educated graduate students with descriptions of greenhouse warming drawn from the IPCC’s nontechnical reports. Subjects were then asked to identify the likely response to various scenarios for CO2 emissions or concentrations. The tasks require no mathematics, only an understanding of stocks and flows and basic facts about climate change. Overall performance was poor. Subjects often select trajectories that violate conservation of matter. Many believe temperature responds immediately to changes in CO2 emissions or concentrations. Still more believe that stabilizing emissions near current rates would stabilize the climate, when in fact emissions would continue to exceed removal, increasing GHG concentrations and radiative forcing. Such beliefs support wait and see policies, but violate basic laws of physics.
The climate bathtubs are really a chain of stock processes: accumulation of CO2 in the atmosphere, accumulation of heat in the global system, and accumulation of meltwater in the oceans. How we respond to those, i.e. our emissions trajectory, is conditioned by some additional bathtubs: population, capital, and technology. This post is a quick look at the first.
I’ve grabbed the population sector from the World3 model. Regardless of what you think of World3’s economics, there’s not much to complain about in the population sector. It looks like this:
People are categorized into young, reproductive age, working age, and older groups. This 4th order structure doesn’t really capture the low dispersion of the true calendar aging process, but it’s more than enough for understanding the momentum of a population. If you think of the population in aggregate (the sum of the four boxes), it’s a bathtub that fills as long as births exceed deaths. Roughly tuned to history and projections, the bathtub fills until the end of the century, but at a diminishing rate as the gap between births and deaths closes:
Notice that the young (blue) peak in 2030 or so, long before the older groups come into near-equilibrium. An aging chain like this has a lot of momentum. A simple experiment makes that momentum visible. Suppose that, as of 2010, fertility suddenly falls to slightly below replacement levels, about 2.1 children per couple. (This is implemented by changing the total fertility lookup). That requires a dramatic shift in birth rates:
However, that doesn’t translate to an immediate equilibrium in population. Instead,population still grows to the end of the century, but reaching a lower level. Growth continues because the aging chain is internally out of equilibrium (there’s also a small contribution from ongoing extension of life expectancy, but it’s not important here). Because growth has been ongoing, the demographic pyramid is skewed toward the young. So, while fertility is constant per person of child-bearing age, the population of prospective parents grows for a while as the young grow up, and thus births continue to increase. Also, at the time of the experiment, the elderly population has not reached equilibrium given rising life expectancy and growth down the chain.
Achieving immediate equilibrium in population would require a much more radical fall in fertility, in order to bring births immediately in line with deaths. Implementing such a change would require shifting yet another bathtub – culture – in a way that seems unlikely to happen quickly. It would also have economic side effects. Often, you hear calls for more population growth, so that there will be more kids to pay social security and care for the elderly. However, that’s not the first effect of accelerated declines in fertility. If you look at the dependency ratio (the ratio of the very young and old to everyone else), the first effect of declining fertility is actually a net benefit (except to the extent that young children are intrinsically valued, or working in sweatshops making fake Gucci wallets):
The bottom line of all this is that, like other bathtubs, it’s hard to change population quickly, partly because of the physics of accumulation of people, and partly because it’s hard to even talk about the culture of fertility (and the economic factors that influence it). Population isn’t likely to contribute much to meeting 2020 emissions targets, but it’s part of the long game. If you want to win the long game, you have to anticipate long delays, which means getting started now.
System dynamics models handle data in various ways. Traditionally, time series inputs were embedded in so-called lookups or table functions (DYNAMO users will remember TABHL for example). Lookups are really best suited for graphically describing a functional relationship. They’re really cool in Vensim’s Synthesim mode, where you can change the shape of a relationship and watch the behavioral consequence in real time.
Time series data can be thought of as f(time), so lookups are often used as data containers. This works decently when you have a limited amount of data, but isn’t really suitable for industrial strength modeling. Those familiar with advanced versions of Vensim may be aware of data variables – a special class of equation designed for working with time series data rather than endogenous structure.
There are many advantages to working with data variables:
You can tell where there are data points, visually on graphs or in equations by testing for a special :NA: value indicating missing data.
You can easily determine the endpoints of a series and vary the interpolation method.
Data variables execute outside the main sequence of the model, so they don’t bog down optimization or Synthesim.
It’s easier to use diverse sources for data (Excel, text files, ODBC, and other model runs) with data variables.
You can see the data directly, without creating extra variables to manipulate it.
In calibration optimization, data variables contribute to the payoff only when it makes sense (i.e., when there’s real data).
I think there are just two reasons to use lookups as containers for data:
You want compatibility with Vensim PLE (e.g., for students)
You want to expose the data stream to quick manipulation in a user interface
Otherwise, go for data variables. Occasionally, there are technical limitations that make it impossible to accomplish something with a data equation, but in those cases the solution is generally a separate data model rather than use of lookups. More on that soon.
I’ve been testing a data mining and visualization tool called Tableau. It seems to be a hot topic in that world, and I can see why. It’s a very elegant way to access large database servers, slicing and dicing many different ways via a clean interface. It works equally well on small datasets in Excel. It’s very user-friendly, though it helps a lot to understand the relational or multidimensional data model you’re using. Plus it just looks good. I tried it out on some graphics I wanted to generate for a collaborative workshop on the Western Climate Initiative. Two examples:
A year or two back, I created a tool, based on VisAD, that uses the Vensim .dll to do multidimensional visualization of model output. It’s much cruder, but cooler in one way: it does interactive 3D. Anyway, I hoped that Tableau, used with Vensim, would be a good replacement for my unfinished tool.
After some experimentation, I think there’s a lot of potential, but it’s not going to be the match made in heaven that I hoped for. Cycle time is one obstacle: data can be exported from Vensim in .tab, .xls, or a relational table format (known as “data list” in the export dialog). If you go the text route (.tab), you have to pass through Excel to convert it to .csv, which Tableau reads. If you go the .xls route, you don’t need to pass through Excel, but may need to close/open the Tableau workspace to avoid file lock collisions. The relational format works, but yields a fundamentally different description of the data, which may be harder to work with.
I think where the pairing might really shine is with model output exported to a database server via Vensim’s ODBC features. I’m lukewarm on doing that with relational databases, because they just don’t get time series. A multidimensional database would be much better, but unfortunately I don’t have time to try at the moment.
Whether it works with models or not, Tableau is a nice tool, and I’d recommend a test drive.
NASA has an interesting article on the fall of the Maya. NASA-sponsored authors used climate models to simulate the effects of deforestation on local conditions. The result: evidence for a positive feedback cycle of lower yields, requiring greater deforestation to increase cultivated area, causing drought and increased temperatures, further lowering yields.
“They did it to themselves,” says veteran archeologist Tom Sever.
…
A major drought occurred about the time the Maya began to disappear. And at the time of their collapse, the Maya had cut down most of the trees across large swaths of the land to clear fields for growing corn to feed their burgeoning population. They also cut trees for firewood and for making building materials.
“They had to burn 20 trees to heat the limestone for making just 1 square meter of the lime plaster they used to build their tremendous temples, reservoirs, and monuments,” explains Sever.
…
“In some of the Maya city-states, mass graves have been found containing groups of skeletons with jade inlays in their teeth – something they reserved for Maya elites – perhaps in this case murdered aristocracy,” [Griffin] speculates.
No single factor brings a civilization to its knees, but the deforestation that helped bring on drought could easily have exacerbated other problems such as civil unrest, war, starvation and disease.
An SD Conference article by Tom Forest fills in some of the blanks on the other problems:
… this paper illustrates how humans can politically intensify resource shortages into universal disaster.
In the current model, the land sector has two variables. One is productivity, which is exhausted by people but regenerates over a period of time. The other… is Available Land. When population exceeds carrying capacity, warfare frequency and intensity increase enough to depopulate land. In the archaeological record this is reflected by the construction of walls around cities and the abandonment of farmlands outside the walls. Some land becomes unsafe to use because of conflict, which then reduces the carrying capacity and intensifies warfare. This is an archetypal death spiral. Land is eventually reoccupied, but more slowly than the abandonment. A population collapse eventually hastens the recovery of productivity, so after the brief but severe collapse growth resumes from a much lower level.
…
The key dynamic is that people do not account for the future impact of their numbers on productivity, and therefore production, when they have children. Nor does death by malnutrition and starvation have an immediate effect. This leads to an overshoot, as in the Limits to Growth, but the policy response is warfare proportionate to the shortfall, which takes more land out of production and worsens the shortfall.
Put another way, in the growth phase people are in a positive-sum game. There is more to go around, more wealth to share, and population increase is unhindered by policy or production. But once the limits are reached, people are in a zero-sum game, or even slightly negative-sum. Rather than share the pain, people turn on each other to increase their personal share of a shrinking pie at the expense of others. The unintended consequence-the fatal irony-is that by doing so, the pie shrinks much faster than it would otherwise. Apocalypse is the result.
Making climate endogenous in Forest’s model would add another positive feedback loop, deepening the trap for a civilization that crosses the line from resource abundance to scarcity and degradation.