This is what the reinforcing loop of practice > skill > fun > motivation > practice gets you:
At ISDC 2018, we gave the Dana Meadows Award for best student paper to Gizem Aktas, for Modeling the Biological Mechanisms that Determine the Dynamics of Stress Response of the Human Body (with Yaman Barlas). This is a very interesting paper that elegantly synthesizes literature on stress, mood, and hormone interactions. I plan to write more about it later, but for the moment, here’s the model for your exploration.
The dynamic stress response of the human body to stressors is produced by nonlinear interactions among its physiological sub-systems. The evolutionary function of the response is to enable the body to cope with stress. However, depending on the intensity and frequency of the stressors, the mechanism may lose its function and the body can go into a pathological state. Three subsystems of the body play the most essential role in the stress response: endocrine, immune and neural systems. We constructed a simulation model of these three systems to imitate the stress response under different types of stress stimuli. Cortisol, glucocorticoid receptors, proinflammatory cytokines, serotonin, and serotonin receptors are the main variables of the model. Using both qualitative and quantitative physiological data, the model is structurally and behaviorally well-validated. In subsequent scenario runs, we have successfully replicated the development of major depression in the body. More interestingly, the model can present quantitative representation of some very well acknowledged qualitative hypotheses about the stress response of the body. This is a novel quantitative step towards the comprehension of stress response in relation with other disorders, and it provides us with a tool to design and test treatment methods.
The original is a STELLA model; here I’ve translated it to Vensim and made some convenience upgrades. I used the forthcoming XMILE translation in Vensim to open the model. You get an ugly diagram (due to platform differences and XMILE’s lack of support for flow-clouds), but it’s functional enough to browse. I cleaned up the diagrams and moved them into multiple views to take better advantage of Vensim’s visual approach.
The model ran right away, though I had to add one MAX statement to handle a uniflow (not supported in Vensim, and something I remain allergic to). There’s actually an important lesson on model replication and calibration in this.
When I first translated the model, I ran a few scenarios, using the comprehensive replication instructions in the supplemental material for the paper. I built up a Vensim command script to make it easy to replicate all the scenarios in the paper. To do that, I had to modify the equations a bit, so that manual equation editing (in STELLA) could be replaced by automatic parameter changes.
Then I ran my script and eyeballed a few graphs. Things looked pretty good:
The same, right? Not so fast! If you look closely, you’ll find that the Vensim version (bottom) has 9 peaks instead of 10, due to my replacement of a cascade of IF … ELSE test inputs with a simpler PULSE TRAIN. When you fix the count, there are still issues, because the duration parameter for each pulse (0.2) is not an integral multiple of the TIME STEP. (Incidentally, differences arising from PULSE implementations are tricky – see Yutaka Takahashi’s poster from ISDC 2018).
It took me several iterations to work out what was going wrong. I found that, to really verify that the translation (plus my initially erroneous upgrades) was OK, I had to export a run from STELLA, import it as a dataset in Vensim, and compare behavior hour by hour. That’s how I discovered the subtle but important uniflow difference.
The fact that tiny differences in test input implementations matter highlights the extreme numerical sensitivity of the model. This is a feature, not a bug. It arises from positive feedback that creates sensitive thresholds in stress response: 5% more episodic stress can be the difference between routine recovery and total collapse.
For example, here’s a sensitivity experiment with external stress at 10, 20, 30, 40, 50 & 60 units:
Notice that for external stress <= 40, recovery is quick – hours to days. But somewhere above 40 is a nonlinear threshold, beyond which recovery takes weeks.
This .zip archive contains:
- An updated source model (.stmx) from the author, used for the translation.
- The translated model (.mdl and .vpm). This version won’t work in PLE because it uses macros, but you can use the free Model Reader to run it.
- Command scripts for replicating the paper’s scenarios, plus the vector of stress levels above.
Update: StressResponseModel_converted 7b.zip fixes a unit error in a test input (my mistake) – this version is closest to the original in the paper.
Update 2: StressResponseModel_converted 8.zip has an improved control panel and runs 4x faster. It departs from the original to improve sensitivity analysis capability and pulse test stability, but remains dynamically identical (as far as I can determine).
The original paper and supplementary material should be in the conference submission system.
Stay tuned for more on this topic!
Box’s famous comment, that “all models are wrong,” gets repeated ad nauseum (even by me). I think it’s essential to be aware of this in the sloppy sciences, but it does a disservice to modeling and simulation in general.
As far as I’m concerned, a lot of models are basically right. I recently worked with some kids on an air track experiment in physics. We timed the acceleration of a sled released from various heights, and plotted the data. Then we used a quadratic fit, based on a simple dynamic model, to predict the next point. We were within a hundredth of a second, confirmed by video analysis.
Sure, we omitted lots of things, notably air resistance and relativity. But so what? There’s no useful sense in which the model was “wrong,” anywhere near the conditions of the experiment. (Not surprisingly, you can find a few cranks who contest Newton’s laws anyway.)
I think a lot of uncertain phenomena in social sciences operate on a backbone of the same kind of “physics.” The future behavior of the government is quite unpredictable, but there isn’t much uncertainty about accounting, e.g., that increasing the deficit increases the debt.
The domain of wrong but useful models remains large (within an even bigger sea of simple ignorance), but I think more and more things are falling into the category of models that are basically right. The trick is to be able to spot the difference. Some people clearly can’t:
A&G provide no formal method to distinguish between situations in which models yield useful or spurious forecasts. In an earlier paper, they claimed rather broadly,
‘To our knowledge, there is no empirical evidence to suggest that presenting opinions in mathematical terms rather than in words will contribute to forecast accuracy.’ (page 1002)
This statement may be true in some settings, but obviously not in general. There are many situations in which mathematical models have good predictive power and outperform informal judgments by a wide margin.
I wonder how well one could do with verbal predictions of a simple physical system? Score one for the models.
This is the model library entry for my ISDC 2017 plenary paper with Larry Yeager on dynamic cohorts in Ventity:
Dynamic cohorts: a new approach to managing detail
While it is desirable to minimize the complexity of a model, some problems require the detailed representation of heterogeneous subgroups, where nonlinearities prevent aggregation or explicit chronological aging is needed. It is desirable to have a representation that avoids burdening the modeler or user computationally or cognitively. Eberlein & Thompson (2013) propose continuous cohorting, a novel solution to the cohort blending problem in population modeling, and test it against existing aging chain and cohort-shifting approaches. Continuous cohorting prevents blending of ages and other properties, at at some cost in complexity.
We propose another new solution, dynamic cohorts, that prevents blending with a comparatively low computational burden. More importantly, the approach simplifies the representation of distinct age, period and cohort effects and representation of dynamics other than the aging process, like migration and attribute coflows. By encapsulating the lifecycle of a representative cohort in a single entity, rather than dispersing it across many states over time, it makes it easier to develop and explain the model structure.
Paper: Dynamic Cohorts P1363.pdf
Models: Dynamic Cohorts S1363.zip
Presentation slides: Dynamic Cohorts Fid Ventana v2b.pdf
I’ve previously written about this here.
Hoisted from a comment by Luzi on Vi Hart on positive feedback driving polarization:
Can you find the point where the many positive feedback loops can be balanced or broken?
Every couple of years, an article comes out reviewing the performance of the World3 model against data, or constructing an alternative, extended model based on World3. Here’s the latest:
This study investigates the notion of limits to socioeconomic growth with a specific focus on the role of climate change and the declining quality of fossil fuel reserves. A new system dynamics model has been created. The World Energy Model (WEM) is based on the World3 model (The Limits to Growth, Meadows et al., 2004) with climate change and energy production replacing generic pollution and resources factors. WEM also tracks global population, food production and industrial output out to the year 2100. This paper presents a series of WEM’s projections; each of which represent broad sweeps of what the future may bring. All scenarios project that global industrial output will continue growing until 2100. Scenarios based on current energy trends lead to a 50% increase in the average cost of energy production and 2.4–2.7 °C of global warming by 2100. WEM projects that limiting global warming to 2 °C will reduce the industrial output growth rate by 0.1–0.2%. However, WEM also plots industrial decline by 2150 for cases of uncontrolled climate change or increased population growth. The general behaviour of WEM is far more stable than World3 but its results still support the call for a managed decline in society’s ecological footprint.
The new paper puts economic collapse about a century later than it occurred in Limits. But that presumes that the phrase highlighted above is a legitimate simplification: GHGs are the only pollutant, and energy the only resource, that matters. Are we really past the point of concern over PCBs, heavy metals, etc., with all future chemical and genetic technologies free of risk? Well, maybe … (Note that climate integrated assessment models generally indulge in the same assumption.)
But quibbling over dates is to miss a key point of Limits to Growth: the model, and the book, are not about point prediction of collapse in year 20xx. The central message is about a persistent overshoot behavior mode in a system with long delays and finite boundaries, when driven by exponential growth.
We have deliberately omitted the vertical scales and we have made the horizontal time scale somewhat vague because we want to emphasize the general behavior modes of these computer outputs, not the numerical values, which are only approximately known.
There are two ways to go about building a model.
- Plan A proceeds slowly. You build small or simple, aggregate components, and test each thoroughly before moving on.
- Plan B builds a rough model spanning the large scope that you think encompasses the problem, then incrementally improves the solution.
Ideally, both approaches converge to the same point.
Plan B is attractive, for several reasons. It helps you to explore a wide range of ideas. It gives a satisfying illusion of rapid progress. And, most importantly, it’s satisfying for stakeholders, who typically have a voracious appetite for detail and a limited appreciation of dynamics.
The trouble is, Plan B does not really exist. When you build a lot of structure quickly, the sacrifice you have to make is ignoring lots of potential interactions, consistency checks, and other relationships between components. You’re creating a large backlog of undiscovered rework, which the extensive SD literature tells us is fatal. So, you’re really on Path C, which leads to disaster: a large, incomprehensible, low-quality model.
In addition, you rarely have as much time as you think you do. When your work gets cut short, only Path A gives you an end product that you can be proud of.
So, resist pressures to include every detail. Embrace elegant simplicity and rich feedback. Check your units regularly, test often, and “always be done” (as Jim Hines puts it). Your life will be easier, and you’ll solve more problems in the long run.
My award for dumbest headline of the week goes to The Atlantic:
Climate Change Can Be Stopped by Turning Air Into Gasoline
A team of scientists from Harvard University and the company Carbon Engineering announced on Thursday that they have found a method to cheaply and directly pull carbon-dioxide pollution out of the atmosphere.
If their technique is successfully implemented at scale, it could transform how humanity thinks about the problem of climate change. It could give people a decisive new tool in the race against a warming planet, but could also unsettle the issue’s delicate politics, making it all the harder for society to adapt.
Their research seems almost to smuggle technologies out of the realm of science fiction and into the real. It suggests that people will soon be able to produce gasoline and jet fuel from little more than limestone, hydrogen, and air. It hints at the eventual construction of a vast, industrial-scale network of carbon scrubbers, capable of removing greenhouse gases directly from the atmosphere.
The underlying article that triggered the story has nothing to do with turning CO2 into gasoline. It’s purely about lower-cost direct capture of CO2 from the air (DAC). Even if we assume that the article’s right, and DAC is now cheaper, that in no way means “climate change can be stopped.” There are several huge problems with that notion:
First, if you capture CO2 from the air, make a liquid fuel out of it, and burn that in vehicles, you’re putting the CO2 back in the air. This doesn’t reduce CO2 in the atmosphere; it just reduces the growth rate of CO2 in the atmosphere by displacing the fossil carbon that would otherwise be used. With constant radiative forcing from elevated CO2, temperature will continue to rise for a long time. You might get around this by burning the fuel in stationary plants and sequestering the CO2, but there are huge problems with that as well. There are serious sink constraint problems, and lots of additional costs.
Second, just how do you turn all that CO2 into fuel? The additional step is not free, nor is it conventional Fischer-Tropsch technology, which starts with syngas from coal or gas. You need relatively vast amounts of energy and hydrogen to do it on the necessary gigatons/year scale. One estimate puts the cost of such fuels at $3.80-9.20 a gallon (some of the costs overlap, but it’ll be more at the pump, after refining and marketing).
Third, who the heck is going to pay for all of this? If you want to just offset global emissions of ~40 gigatons CO2/year at the most optimistic cost of $100/ton, with free fuel conversion, that’s $4 trillion a year. If you’re going to cough up that kind of money, there are a lot of other things you could do first, but no one has an incentive to do it when the price of emissions is approximately zero.
Ironically, the Carbon Engineering team seems to be aware of these problems:
Keith said it was important to still stop emitting carbon-dioxide pollution where feasible. “My view is we should stick to trying to cut emissions first. As a voter, my view is it’s cheaper not to emit a ton of [carbon dioxide] than it is to emit it and recapture it.”
I think there are two bottom lines here:
- Anyone who claims to have a silver bullet for a problem that pervades all human enterprise is probably selling snake oil.
- Without a substantial emissions price as the primary incentive guiding market decisions about carbon intensity, all large scale abatement efforts are a fantasy.
Climate skeptics seem to have a thing for contests and bets. For example, there’s Armstrong’s proposed bet, baiting Al Gore. Amusingly (for data nerds anyway), the bet, which pitted a null forecast against the taker’s chosen climate model, could have been beaten easily by either a low-order climate model or a less-naive null forecast. And, of course, it completely fails to understand that climate science is not about fitting a curve to the global temperature record.
Another instance of such foolishness recently came to my attention. It doesn’t have a name that I know of, but here’s the basic idea:
- The author generates 1000 time series:
Each series has length 135: the same length as that of the most commonly studied series of global temperatures (which span 1880–2014). The 1000 series were generated as follows. First, 1000 random series were obtained (for more details, see below). Then, some of those series were randomly selected and had a trend added to them. Each added trend was either 1°C/century or −1°C/century. For comparison, a trend of 1°C/century is greater than the trend that is claimed for global temperatures.
- The challenger pays $10 for the privilege of attempting to detect which of the 1000 series are perturbed by a trend, winning $100,000 for correctly identifying 90% or more.
The best challenger managed to identify 860 series, so the prize went unclaimed. But only two challenges are described, so I have to wonder how many serious attempts were made. Had I known about the contest in advance, I would not have tried it. I know plenty about fitting dynamic models to data, though abstract statistical methods aren’t really my thing. But I still have to ask myself some questions:
- Is there really money to be made, or will the author simply abscond to the pub with my $10? For the sake of argument, let’s assume that the author really has $100k at stake.
- Is it even possible to win? The author did not reveal the process used to generate the series in advance. That alone makes this potentially a sucker bet. If you’re in control of the noise and structure of the process, it’s easy to generate series that are impossible to reliably disentangle. (Tellingly, the author later revealed the code to generate the series, but it appears there’s no code to successfully identify 90%!)
For me, the statistical properties of the contest make it an obvious non-starter. But does it have any redeeming social value? For example, is it an interesting puzzle that has something to do with actual science? Sadly, no.
The hidden assumption of the contest is that climate science is about estimating the trend of the global temperature time series. Yes, people do that. But it’s a tiny fraction of climate science, and it’s a diagnostic of models and data, not a real model in itself. Science in general is not about such things. It’s about getting a good model, not a good fit. In some places the author talks about real physics, but ultimately seems clueless about this – he’s content with unphysical models:
Are ARIMA models truly appropriate for climatic time series? I do not have an opinion. There seem to be no persuasive arguments for or against using ARIMA models. Rather, studying such models for climatic series seems to be a worthy area of research.
Liljegren’s argument against ARIMA is that ARIMA models have a certain property that the climate system does not have. Specifically, for ARIMA time series, the variance becomes arbitrarily large, over long enough time, whereas for the climate system, the variance does not become arbitrarily large. It is easy to understand why Liljegren’s argument fails.
It is a common aphorism in statistics that “all models are wrong”. In other words, when we consider any statistical model, we will find something wrong with the model. Thus, when considering a model, the question is not whether the model is wrong—because the model is certain to be wrong. Rather, the question is whether the model is useful, for a particular application. This is a fundamental issue that is commonly taught to undergraduates in statistics. Yet Liljegren ignores it.
As an illustration, consider a straight line (with noise) as a model of global temperatures. Such a line will become arbitrarily high, over long enough time: e.g. higher than the temperature at the center of the sun. Global temperatures, however, will not become arbitrarily high. Hence, the model is wrong. And so—by an argument essentially the same as Liljegren’s—we should not use a straight line as a model of temperatures.
In fact, a straight line is commonly used for temperatures, because everyone understands that it is to be used only over a finite time (e.g. a few centuries). Over a finite time, the line cannot become arbitrarily high; so, the argument against using a straight line fails. Similarly, over a finite time, the variance of an ARIMA time series cannot become arbitrarily large; so, Liljegren’s argument fails.
Actually, no one in climate science uses straight lines to predict future temperatures, because forcing is rising, and therefore warming will accelerate. But that’s a minor quibble, compared to the real problem here. If your model is:
global temperature = f( time )
you’ve just thrown away 99.999% of the information available for studying the climate. (Ironically, the author’s entire point is that annual global temperatures don’t contain a lot of information.)
No matter how fancy your ARIMA model is, it knows nothing about conservation laws, robustness in extreme conditions, dimensional consistency, or real physical processes like heat transfer. In other words, it fails every reality check a dynamic modeler would normally apply, except the weakest – fit to data. Even its fit to data is near-meaningless, because it ignores all other series (forcings, ocean heat, precipitation, etc.) and has nothing to say about replication of spatial and seasonal patterns. That’s why this contest has almost nothing to do with actual climate science.
This is also why data-driven machine learning approaches have a long way to go before they can handle general problems. It’s comparatively easy to learn to recognize the cats in a database of photos, because the data spans everything there is to know about the problem. That’s not true for systemic problems, where you need a web of data and structural information at multiple scales in order to understand the situation.