Mental Models vs. Models in the Loop

Timothy Clancy, Saeed P. Langarudi and Raafat Zaini have an interesting new commentary in the SDR.

Never the strongest: reconciling the four schools of thought in system dynamics in the debate on quality

With the passing of Jay Forrester, the field of system dynamics exists at a similar crossroads. Debates of implicit, if not explicit, inheritance and future direction are already breaking out among competing generals. Who owns Forrester’s legacy? Will we proceed down the reference mode of the Macedonian and Mughal Empires—or will we instead seek an alternative reference mode of Alexandria: integration, reconciliation, and mutually recognized coexistence of different schools within the broader field of system dynamics?

We suggest the latter path—and that begins by recognizing at least four, if not more, distinct schools of thought on how to approach system dynamics and the study of complex systems. We believe these schools arise from differing mental models in the field and the consequences that arise in practice from these differences.

I haven’t really absorbed it yet, so I’ll refrain from direct comment, but it did spur me to finish off a draft of some similar thoughts on these questions.

I personally lean very much toward the hard science, data-driven side of the field: what the authors call the Empirical school of thought. But as a policy, I lean toward a big tent view of the field that includes work with low model content (which I don’t equate with low quality).

I think the central tension in the debate has already been posed by JWF and others long ago – all the way back to Industrial Dynamics really. In Some Basic Concepts in System Dynamics (2009), Forrester summarized,

The basic feedback loop in Figure 4 is too simple to represent real-world situations. But simple loops have more serious shortcomings—they are misleading and teach the wrong lessons. Most of our intuitive learning comes from very simple systems. The truths learned from simple systems are often completely opposite from the behavior of more complex systems. A person understands filling a water glass, as in Figure 3. But, if we go to a system that is only five times as complicated, as in Figure 5, intuition fails. A person cannot look at Figure 5 and anticipate the behavior of the pictured system.

Figure 5 from World Dynamics is five times more complicated than Figure 4 in the sense that it has five stocks—the rectangles in the figure. The figure shows how rapidly apparent complexity increases as more system stocks are added.

Mathematicians would describe Figure 5 as a fifth-order, nonlinear, dynamic system. No one can predict the behavior by studying the diagram or its underlying equations. Only by using computer simulation can the implied behavior be revealed.

I think the message is pretty clear here. To solve complex problems, you must formally simulate the system because mental simulations are treacherous. I’d go even one step further, and argue that it’s not sufficient to simulate the system once, figure out where the leverage point is, implement the solution, and toss out the model when you’re done. The simulation needs to become an ongoing part of the loop for model predictive control.

If that’s the ideal, why settle for anything less? I think there are a number of possible answers.

  1. Even in a perfect world where it’s easy to construct the model-in-the-loop, you need buy-in from the participants in the system to implement the model, and that requires a skillset that’s quite distinct.
  2. While it’s true that no one can intuit the behavior of a 10th order system, there might be a lot of value in managing low-order components of the system that are amenable to mental simulation or simple decision rules. There might be two reasons for this:
    • The complex system is dominated by a few key parameters (as in sloppy systems).
    • Risking global suboptimization by improving locally is better than optimizing nothing (though this might be a matter of luck).
  3. Often no single stakeholder in the system has the resources or authority to implement needed changes. But exposing the connectivity of the system, even if you can’t predict exactly how it works, is sometimes enough to catalyze creation of higher-level structures that enable change in the future.
  4. Not everyone is, or wants to be, a modeler. Moreover some participants in the system may reject models, data, and pretty much everything else since the Enlightenment, but you still have to include them.
  5. The non-modelers, as participants in the system, hold key knowledge that the modelers need.
  6. A qualitative map of a system is a good start towards an eventual quantitative model.
  7. Not every problem is big enough to model.

I’m sure you can probably think of more. I think these are good reasons to embrace non-model-based work on systems, as long as one refrains from making strong predictions about behavior from incomplete descriptions of behavior. Fortunately that leaves a lot of interesting things to think about.

I think the opposite perspective, that nothing is worth doing without a model and data, requires some counterexamples. Are there instances in which a group mapping exercise, playing a dynamic game, or engaging in cross-functional dialog led to reduced performance? I’m not aware of good examples of this, and certainly not of good diagnoses of the outcome. Attribution in complex systems is notoriously difficult. I think what this suggests is that we need stronger links to the evaluation research community, because we don’t really know what works and what doesn’t. We already have some strength in this area from the dynamic decision making experiment thread of SD, but … physician heal thyself.

There is one thing that troubles me though, just beyond the boundaries of our field. It’s climate policy (and related global issues). Most climate policy advocates are in some sense systems thinkers. Many build nice diagrams or use other systemic tools. If you don’t care about systems, it’s hard to see why you’d care about climate to begin with.

Yet … it seems that a substantial fraction of people who are pro-climate policy favor policies that are counterproductive or insufficient. They like low-carbon fuel standards that are unstable, inefficient, and can even increase emissions. They like standards that allocate more property rights to bigger polluters, or simply make it harder to change. They like to impose constraints on new fossil supply that work exactly like OPEC to increase prices and profits for incumbent producers. They subsidize EVs and solar, increasing the incentive to consume energy and congest roads, with benefits accruing to the rich who can afford the capital outlay.

What this means is that my reason #2, “Risking global suboptimization by improving locally is better than optimizing nothing,” isn’t working out too well. I think this is exactly the kind of counterintuitive behavior of social systems that JWF was referring too. I don’t believe you can sort these things out with CLDs or other qualitative methods, except perhaps when they are used as explanatory tools for underlying formal models.

I think the bottom line is that, inside the big tent, the tall pole must remain construction and validation of robust behavioral dynamic models.

Feedback is Interdisciplinary

Quite a while ago, I wrote about modeling the STEM workforce:

An integrated model needs three things: what, how, and why. The “what” is the state of the system – stocks of students, workers, teachers, etc. in each part of the system. Typically this is readily available – Census, NSF and AAAS do a good job of curating such data. The “how” is the flows that change the state. There’s not as much data on this, but at least there’s good tracking of graduation rates in various fields, and the flows actually integrate to the stocks. Outside the educational system, it’s tough to understand the matrix of flows among fields and economic sectors, and surprisingly difficult even to get decent measurements of attrition from a single organization’s personnel records. The glaring omission is the “why” – the decision points that govern the aggregate flows. Why do kids drop out of science? What attracts engineers to government service, or the finance sector, or leads them to retire at a given age? I’m sure there are lots of researchers who know a lot about these questions in small spheres, but there’s almost nothing about the “why” questions that’s usable in an integrated model.

I think the current situation is a result of practicality rather than a fundamental philosophical preference for analysis over synthesis. It’s just easier to create, fund and execute standalone micro research than it is to build integrated models.

According to Jay Forrester, Gordon Brown said it much more succinctly:

The message is in the feedback, and the feedback is inherently
interdisciplinary.

AI doesn’t help modelers

Large language model AI doesn’t help with modeling. At least, that’s my experience so far.


DALL-E images from Bing image creator.

On the ACM blog, Bertrand Meyer argues that AI doesn’t help programmers either. I think his reasons are very much compatible with what I found attempting to get ChatGPT to discuss dynamics:

Here is my experience so far. As a programmer, I know where to go to solve a problem. But I am fallible; I would love to have an assistant who keeps me in check, alerting me to pitfalls and correcting me when I err. A effective pair-programmer. But that is not what I get. Instead, I have the equivalent of a cocky graduate student, smart and widely read, also polite and quick to apologize, but thoroughly, invariably, sloppy and unreliable. I have little use for such  supposed help.

He goes on to illustrate by coding a binary search. The conversation is strongly reminiscent of our attempt to get ChatGPT to model jumping through the moon.

And then I stopped.

Not that I had succumbed to the flattery. In fact, I would have no idea where to go next. What use do I have for a sloppy assistant? I can be sloppy just by myself, thanks, and an assistant who is even more sloppy than I is not welcome. The basic quality that I would expect from a supposedly intelligent  assistant—any other is insignificant in comparison —is to be right.

It is also the only quality that the ChatGPT class of automated assistants cannot promise.

I think the fundamental problem is that LLMs aren’t “reasoning” about dynamics per se (though I used the word in my previous posts). What they know is derived from the training corpus, and there’s no reason to think that it reflects a solid understanding of dynamic systems. In fact there are presumably lots of examples in the corpus of failures to reason correctly about dynamic causality, even in the scientific literature.

This is similar to the reason AI image creators hallucinate legs and fingers: they know what the parts look like, but they don’t know how the parts work together to make the whole.

To paraphrase Meyer, LLM AI is the equivalent of a polite, well-read assistant who lacks an appreciation for complex systems, and aggressively indulges in laundry-list, dead-buffalo thinking about all but the simplest problems. I have no use for that until the situation improves (and there’s certainly hope for that). Worse, the tools are very articulate and confident in their clueless pronouncements, which is a deadly mix of attributes.

Related: On scientific understanding with artificial intelligence | Nature Reviews Physics

Sources of Information for Modeling

The traditional picture of information sources for modeling is a funnel. For example, in Some Basic Concepts in System Dynamics (2009), Forrester showed:

I think the diagram, or at least the concept, is much older than that.

However, I think the landscape has changed a lot, with more to come. Generally, the mental database hasn’t changed too much, but the numerical database has grown a lot. The funnel isn’t 1-dimensional, so the relationships have changed on some axes, but not so much on others.

Notionally, I’d propose that the situation is something like this:

The mental database is still king for variety of concepts and immediacy or salience of information (especially to the owner of the brain involved). And, it still has some weaknesses, like the inability to easily observe, agree on and quantify the constructs included in it. In the last few decades, the numerical database has extended its reach tremendously.

The proper shape of the plot is probably very domain specific. When I drew this, I had in mind the typical corporate or policy setting, where information systems contain only a fraction of the information necessary to understand the organizations involved. But in some areas, the reverse may be true. For example, in earth systems, datasets are vast and include measurements that human senses can’t even make, whereas personal experience – and therefore mental models – is limited and treacherous.

I think I’ve understated the importance of the written database in the diagram above – perhaps I’m missing a dimension characterizing its cumulative nature (compared to the transience of mental databases). There’s also an interesting evolution underway, as tools for text analysis and large language models (ChatGPT) are making the written database more numerical in nature.

Finally, I think there’s a missing database in the traditional framework, which has growing importance. That’s the database of models themselves. They’ve been around for a long time – especially in physical sciences, but also corporate spreadsheets and the like. But increasingly, reasonably sophisticated models of organizational components are available as inputs to higher-level strategic problem solving modeling efforts.

Scientific Revolutions in Ventity

I’ve long wanted to translate the Sterman-Wittenberg model of Kuhnian paradigm revolutions to Ventity. The original was in Dynamo, and I translated that to Vensim, but neither is really satisfactory, because both require provisioning array space for new paradigms statically, before it’s needed. This means simulating lots of useless 0s, and even worse, looking at them in the output.

The model is about the lifecycle of scientific paradigms, so a central feature is the occasional introduction and evolution of new paradigms, which eventually accumulate enough anomalies to erode confidence, making them vulnerable to the next great idea. So ideally, you’d like to introduce new paradigms dynamically and delete them when they no longer have many adherents. Dynamic creation and deletion of entities is of course a core feature of Ventity – it’s the tool this model has been waiting for all those years.

I finally got around to translating my Vensim version to Ventity recently. It works beautifully:

Above, paradigm confidence, showing eight dominant paradigms as well as many smaller paradigms that never rise to dominance. They disappear when they run out of adherents. Below, puzzles under attack for the same paradigms.

Links to the source papers and more notes on the model are in the Vensim library entry. I think the dynamics are generalizable to other aspects of thinking in paradigms, like filter bubbles. The model is also a bit ‘meta’: Ventity as a distinct modeling paradigm that’s neither in the classical array-based world nor the code-based discrete agent world has struggled to win mindshare.

A minor note on use: the Run Config includes two setups: “replicate” and “random”. The “replicate” setup, which is inactive by default, launches paradigms at fixed times given by initialization data from a run of the Vensim version. This makes it possible to compare the simulations without divergence from randomness. However, the randomized run will normally be the more interesting way to work with this model.

The model (requires Ventity, which has a free trial license):

SciRev 15.zip

Computer Collates Climate Contrarian Claims

Coan et al. in Nature have an interesting text analysis of climate skeptics’ claims.

I’ve been at this long enough to notice that a few perennial favorites are missing, perhaps because they date from the 90s, prior to the dataset.

The big one is “temperature isn’t rising” or “the temperature record is wrong.” This has lots of moving parts. Back in the 90s, a key idea was that satellite MSU records showed falling temperatures, implying that the surface station record was contaminated by Urban Heat Island (UHI) effects. That didn’t end well, when it turned out that the UAH code had errors and the trend reversed when they were fixed.

Later UHI made a comeback when the SurfaceStations project crowdsourced an assessment of temperature station quality. Some turned out to be pretty bad. But again, when the dust settled, it turned out that the temperature trend was bigger, not smaller, when poor sites were excluded and TOD was corrected. This shouldn’t have been a surprise, because windy day analsyses and a dozen other things already ruled out UHI, but …

I consider this a reminder of the fact that part of the credibility of mainstream climate science arises not from the fact that models are so good, but because so many alternatives have been tried, and proved so bad, only to rise again and again.

Spreadsheets Strike Again

In this BBC podcast, stand-up mathematician Matt Parker explains the latest big spreadsheet screwup: overstating European productivity growth.

There are a bunch of killers in spreadsheets, but in this case the culprit was lack of a time axis concept, making it easy to misalign times for the GDP and labor variables. The interesting thing is that a spreadsheet’s strong suite – visibility of the numbers – didn’t help. Someone should have seen 22% productivity growth and thought, “that’s bonkers” – but perhaps expectations of a COVID19 rebound short-circuited the mental reality check.

ChatGPT struggles with pandemics

I decided to try out a trickier problem on ChatGPT: epidemiology.

This is tougher, because it requires some domain knowledge about terminology as well as some math. R0 itself is a slippery concept. It appears that ChatGPT is essentially equating R0 and the transmission rate; perhaps the result would be different had I used a different concept like force of infection.

Notice how ChatGPT is partly responding to my prodding, but stubbornly refuses to give up on the idea that the transmission rate needs to be less than R0, even though the two are not comparable.

Well, we got there in the end.