Just Say No to Complex Equations

Found in an old version of a project model:

IF THEN ELSE( First Time Work Flow[i,Proj,stage
] * TIME STEP >= ( Perceived First Time Scope UEC Work
[i,Proj,uec] + Unstarted Work[i,Proj,stage] )
:OR: Task Is Active[i,Proj,Workstage] = 0
:OR: avg density of OOS work[i,Proj,stage] > OOS density threshold,
Completed work still out of sequence[i,Proj,stage] / TIME STEP
+ new work out of sequence[i,Proj,stage] ,
MIN( Completed work still out of sequence[i,Proj,stage] / Minimum Time to Retrofit Prerequisites into OOS Work
+ new work out of sequence[i,Proj,stage],
new work in sequence[i,Proj,stage]
* ZIDZ( avg density of OOS work[i,Proj,stage],
1 – avg density of OOS work[i,Proj,stage] ) ) )

An equation like this needs to be broken into at least 3 or 4 human-readable chunks. In reviewing papers for the SD conference, I see similar constructions more often than I’d like.

Defining SD

Open Access Note by Asmeret Naugle, Saeed Langarudi, Timothy Clancy: https://doi.org/10.1002/sdr.1762

Abstract
A clear definition of system dynamics modeling can provide shared understanding and clarify the impact of the field. We introduce a set of characteristics that define quantitative system dynamics, selected to capture core philosophy, describe theoretical and practical principles, and apply to historical work but be flexible enough to remain relevant as the field progresses. The defining characteristics are: (1) models are based on causal feedback structure, (2) accumulations and delays are foundational, (3) models are equation-based, (4) concept of time is continuous, and (5) analysis focuses on feedback dynamics. We discuss the implications of these principles and use them to identify research opportunities in which the system dynamics field can advance. These research opportunities include causality, disaggregation, data science and AI, and contributing to scientific advancement. Progress in these areas has the potential to improve both the science and practice of system dynamics.

I shared some earlier thoughts here, but my refined view is in the SDR now:


Invited Commentaries by Tom Fiddaman, Josephine Kaviti Musango, Markus Schwaninger, Miriam Spano: https://doi.org/10.1002/sdr.1763

Model quality: draining the swamp for large models

In my high road post a while ago, I advocated “voluntary simplicity” as a way of avoiding a large model with an insurmountable burden of undiscovered rework.

Sometimes this is not a choice, because you’re asked to repurpose a large model for some related question. Maybe you didn’t build it, or maybe you did, but it’s time to pause and reflect before proceeding. It’s critical to determine where on the spectrum above the model lies – between the Vortex of Confusion and Nirvana.

I will assert from experience that a large model was almost always built with some narrow questions in mind, and that it was exercised mainly over parts of its state space relevant to those questions. It’s quite likely that a lot of bad behaviors lurk in other parts of the state space. When you repurpose the model those things are going to bite you.

If you start down the red road, “we’ll just add a little stuff for the new question…” you will find yourself in a world of hurt later. It’s absolutely essential that you first do some rigorous testing to establish what the limitations of the model might be in the new context.

The reason scope is so dangerous is that its effect on your ability to make quality progress is nonlinear. The number of interactions you have to manage, and therefore opportunities for errors, and complexity of corrections, is proportional to scope squared. The speed of your model declines with 1/scope, as does the time you have to pay attention to each variable.

My tentative recipe for success, or at least survival:

1. Start early. Errors beget more errors, so the sooner you discover them, the sooner you can arrest that vicious cycle.

2. Be ruthless. Don’t test to see if the model can answer the new question; test to see if you can break it and get nonsense answers.

3. Use your tools. Pay attention to unit errors and runtime warnings. Write Reality Checks to automate tests. Set ranges on key variables to ensure that they’re within reason.

4. Isolate. Because of the nonlinear interaction problem, it’s hard to interpret tests on a full model. Instead, extract components and test them in isolation. You can do this by copy-pasting, or even easier in Vensim, by using Synthesim Overrides to modify inputs to steps, ramps, etc.

5. Don’t let go. When you find a problem, track it back to its root cause.

6. Document. Keep a lab notebook, or an email stream, or a todo list, so you don’t lose the thought when you have multiple issues to chase.

7. Be extreme. Pick a stock and kick it with a pulse or an override. Take all the people out of the factory, or all the ships out of the fleet. What happens? Does anything go negative? Do decisions remain consistent with goals?

8. Calibrate. Calibration against data can be a useful way to find issues, but it’s much slower than other options, so this is something to pursue late in the process. Also, remember that model-data gaps are equally likely to reveal a problem with the data.

9. Rebuild. If you’re finding a lot of problems, you may be better of starting clean, using the existing model as a conceptual guide, but reconsidering the detailed design of the implementation.

10. Submodel. It’s often hard to fix something inside the full plate of spaghetti. But you may be able to identify a solution in an external submodel, free of distractions, and then transplant it back into the host.

11. Reduce. If you can’t rebuild the full scope within available resources, cut things out. This may not be appetizing to the client, but it’s certainly better than delivering a fragile model that only works if you don’t touch it.

12. If you find you’re in a hole, stop digging. Don’t add any features until you have things under control, because they’ll only exacerbate the problems.

13. Communicate. Let the client, and your team, know what you’re up to, and why quality is more important than their cherished kitchen sink.

Integration & Correlation

Claims by AI chatbots, engineers and Nobel prize winners notwithstanding, absence of correlation does not prove absence of causation, any more than presence of correlation proves presence of causation. Bard outlines several reasons from noise and nonlinearity, but missed a key one: bathtub statistics.

Here’s a really simple example of how this reasoning can go wrong. Consider a system with a stock Y(t) that integrates a flow X(t):

X(t) = -t

Y(t) = ∫X(t)dt

We don’t need to simulate to solve for Y(t) = -1/2*t^2 +C.

Over the interval t=[-1,1] the X and Y time series look like this:

The X-Y relationship is parabolic, with correlation zero:

Zero correlation can’t mean “not causal” because we constructed the system to be causal. Even worse, the sign of the relationship depends on the subset of the interval you examine:


This is not the only puzzling case. Consider instead:

X(t) = 1

Y(t) = ∫X(t)dt = t + C

In this case, X(t) has zero variance. But Corr(X,Y) = Cov(X,Y)/σ(X)σ(Y) which is 0/0. What are we to make of that?

This pathology can also arise from feedback. Consider a thermostat that controls a heater that operates in two states (on or off). If the heater is fast, and the thermostat is sensitive with a narrow temperature band, then σ(temperature) will be near 0, even though the heater is cycling with σ(heater state)>0.

AI Chatbots on Causality

Having recently encountered some major causality train wrecks, I got curious about LLM “understanding” of causality. If AI chatbots are trained on the web corpus, and the web doesn’t “get” causality, there’s no reason to think that AI will make sense either.

TLDR; ChatGPT and Bing utterly fail this test, for reasons that are evident in Google Bard’s surprisingly smart answer.

ChatGPT: FAIL

Bing: FAIL

Google Bard: PASS

Google gets strong marks for mentioning a bunch of reasons to expect that we might not find a correlation, even though x is known to cause y. I’d probably only give it a B+, because it neglected integration and feedback, but it’s a good answer that properly raises lots of doubts about simplistic views of causality.

Climate Causality Confusion

A newish set of papers (1. Theory (preprint); 2. Applications (preprint); 3. Extension) is making the rounds on the climate skeptic sites, with – ironically – little skepticism applied.

The claim is bold:

… According to the commonly assumed causality link, increased [CO2] causes a rise in T. However, recent developments cast doubts on this assumption by showing that this relationship is of the hen-or-egg type, or even unidirectional but opposite in direction to the commonly assumed one. These developments include an advanced theoretical framework for testing causality based on the stochastic evaluation of a potentially causal link between two processes via the notion of the impulse response function. …. All evidence resulting from the analyses suggests a unidirectional, potentially causal link with T as the cause and [CO2] as the effect.

Galileo complex seeps in when the authors claim that absence of correlation or impulse response from CO2 -> temperature proves absence of causality:

Clearly, the results […] suggest a (mono-directional) potentially causal system with T as the cause and [CO2] as the effect. Hence the common perception that increasing [CO2] causes increased T can be excluded as it violates the necessary condition for this causality direction.

Unfortunately, these claims are bogus. Here’s why.

The authors estimate impulse response functions between CO2 and temperature (and back), using the following formalism:


where g(h) is the response at lag h. As the authors point out, if

the IRF is zero for every lag except for the specific lag 0, then Equation (1) becomes y(t)=bx(t-h0) +v(t). This special case is equivalent to simply correlating  y(t) with x(t-h0) at any time instance . It is easy to find (cf. linear regression) that in this case the multiplicative constant is the correlation coefficient of y(t) and  x(t-h0) multiplied by the ratio of the standard deviations of the two processes.

Now … anyone who claims to have an “advanced theoretical framework for testing causality” should be aware of the limitations of linear regression. There are several possible issues that might lead to misleading conclusions about causality.

Problem #1 here is bathtub statistics. Temperature integrates the radiative forcing from CO2 (and other things). This is not debatable – it’s physics. It’s old physics, and it’s experimental, not observational. If you question the existence of the effect, you’re basically questioning everything back to the Enlightenment. The implication is that no correlation is expected between CO2 and temperature, because integration breaks pattern matching. The authors purport to avoid integration by using first differences of temperature and CO2. But differencing both sides of the equation doesn’t solve the integration problem; it just kicks the can down the road. If y integrates x, then patterns of the integrals or derivatives of y and x won’t match either. Even worse differencing filters out the signals of interest.

Problem #2 is that the model above assumes only equation error (the term v(t) on the right hand side). In most situations, especially dynamic systems, both the “independent” (a misnomer) and dependent variables are subject to measurement error, and this dilutes the correlation or slope of the regression line (aka attenuation bias), and therefore also the IRF in the authors’ framework. In the case of temperature, the problem is particularly acute, because temperature also integrates internal variability of the climate system (weather) and some of this variability is autocorrelated on long time scales (because for example oceans have long time constants). That means the effective number of data points is a lot less than the 60 years or 720 months you’d expect from simple counting.

Dynamic variables are subject to other pathologies, generally under the heading of endogeneity bias, and related features with similar effects like omitted variable bias. Generalizing the approach to distributed lags in no way mitigates these. The bottom line is that absence of correlation doesn’t prove absence of causation.

Admittedly, even Nobel Prize winners can screw up claims about causality and correlation and estimate dynamic models with inappropriate methods. But causality confusion isn’t really a good way to get into that rarefied company.

I think methods purporting to assess causality exclusively from data are treacherous in general. The authors’ proposed method is provably wrong in some cases, including this one, as is Granger Causality. Even if you have pretty good assumptions, you’ll always find a system that violates them. That’s why it’s so important to take data-driven results with a grain of salt, and look for experimental control (where you can get it) and mechanistic explanations.

One way to tell if you’ve gotten causality wrong is when you “discover” mechanisms that are physically absurd. That happens on a spectacular scale in the third paper:

… we find Δ=23.5 and 8.1 Gt C/year, respectively, i.e., a total global increase in the respiration rate of Δ=31.6 Gt C/year. This rate, which is a result of natural processes, is 3.4 times greater than the CO2 emission by fossil fuel combustion (9.4 Gt C /year including cement production).

To put that in perspective, the authors propose a respiration flow that would put the biosphere about 30% out of balance. This implies a mass flow of trees harvested, soils destroyed, etc. 3.4 times as large as the planetary flow of fossil fuels. That would be about 4 cubic kilometers of wood, for example. In the face of the massive outflow from the biosphere, the 9.4 GtC/yr from fossil fuels went where, exactly? Extraordinary claims require extraordinary evidence, but the authors apparently haven’t pondered how these massive novel flows could be squared with other lines of evidence, like C isotopes, ocean Ph, satellite CO2, and direct estimates of land use emissions.

This “insight” is used to construct a model of the temperature->CO2 process:

In this model, the trend in CO2 is explained almost exclusively by the mean temperature effect mu_v = alpha*(T-T0). That effect is entirely ad hoc, with no basis in the impulse response framework.

How do we get into this pickle? I think the simple answer is that the authors’ specification of the system is incomplete. As above, they define a causal system,

y(t) = ∫g1(h)x(t-h)dh

x(t) = ∫g2(h)y(t-h)dh

where g(.) is an impulse response function weighting lags h and the integral is over h from 0 to infinity (because only nonnegative lags are causal). In their implementation, x and y are first differences, so in their climate example, Δlog(CO2) and ΔTemp. In the estimation of the impulse lag structures g(.), the authors impose nonnegativity and (optionally) smoothness constraints.

A more complete specification is roughly:

Y = A*X + U

dX/dt = B*X + E

where

  • X is a vector of system states (e.g., CO2 and temperature)
  • Y is a vector of measurements (observed CO2 and temperature)
  • A and B are matrices of coefficients (this is a linear view of the system, but could easily be generalized to nonlinear functions)
  • E is driving noise perturbing the state, and therefore integrated into it
  • U is measurement error

My notation could be improved to consider covariance and state-dependent noise, though it’s not really necessary here. Fred Schweppe wrote all this out decades ago in Uncertain Dynamic Systems, and you can now find it in many texts like Stengel’s Optimal Control and Estimation. Dixit and Pindyck transplanted it to economics and David Peterson brought it to SD where it found its way into Vensim as the combination of Kalman filtering and optimization.

How does this avoid the pitfalls of the Koutsoyiannis et al. approach?

  • An element of X can integrate any other element of X, including itself.
  • There are no arbitrary restrictions (like nonnegativity) on the impulse response function.
  • The system model (A, B, and any nonlinear elements augmenting the framework) can incorporate a priori structural knowledge (e.g., physics).
  • Driving noise and measurement error are recognized and can be estimated along with everything else.

Does the difference matter? I’ll leave that for a second post with some examples.

 

 

1975 – the CIA Evaluates SD

I just had a great time at the MIT SD Group for a Friday seminar. Lots to think about! Hopefully I can report on a few topics later.

In the meantime, Hesam Mahmoudi showed me a fun tidbit (via Navid Ghaffarzadegan). It’s a declassified CIA evaluation of MIT SD developments circa 1975, which one contributor refers to as the “Forrester cult.” I can’t believe I haven’t seen this before.

The first report is an interesting read, but mainly for its naïvely arrogant clever snark. The author appears to have completely missed the point.

I’d be interested to know what models specifically are “the same kinds” as industrial dynamics. AFAIK, economics was pretty solidly entrenched in econometrics at the time, and that had little to do with dynamics. Dynamic models were not unknown, including the Ramsey model (solved analytically), the Samuelson multiplier-accelerator (oops), and the hydraulic Phillips machine, but they were hardly mainstream.

Well, actually, we mostly use differential equations, because discrete time stinks. But that’s a minor point.

“Snake diagram” … oil … get it? Snake oil?

The author implements this as:

This doesn’t actually have much to do with SD. No one would formulate a market clearing mechanism this way, because the discrete time implementation has obvious flaws, including conflating the time step with the time scale of the price adjustment process. The initial condition for price is also omitted.

The “striking similarity between this model and good old supply-and-demand analysis” is clearly referencing the familiar plot of the intersection of supply and demand curves, which is generally about consumer surplus, taxation, technical shifts, etc. – nothing to do with dynamics. Instead, this is the cobweb model:

Obviously the dynamics lie on the supply and demand curves, but except for a trivial equilibrium point, it’s an oscillator, damped over half its parameter space, but explosive in the other half. This is basically a big exercise in DT error. The degree of damping depends on the relative slopes of the supply and demand curves, which is problematic because we don’t necessarily expect oscillatory behavior from models with stiff elasticities in the real world. The discrete time specification neglects the time constant of the adjustment process; slow adjustment is not the same as low elasticity, but the two are conflated here. This is actually a common problem in econometric models and might partly explain why short term and long term elasticity estimates overlap.

Finally, we get the old red herring, that SD models are over-parameterized:

This is just silly, and also deeply ironic because omitted structure in the author’s proposed model (presumably to avoid having an explicit time constant) would seriously bias parameter estimates. It also embodies the common but wrong view that estimates from the particular data in an analysis are the only information in the universe that can inform a model.

It’s too bad the first author was too busy being dismissive to develop a proper critique, because we all might have learned something from that. Interestingly, though, a second author in the file came away with a different conclusion:

The Way Out

I’ve previously advised that a hyper-vigilant emphasis on model quality is the only viable path to a good model approaching the scope you hope for. That’s the green path.

However, it’s likely you will be captured by the red path at times. Client appetite for scope, desire for rapid perceived progress, the apparent ease of adding features, and existing models that aren’t as good as they box they came in advertised are all seductive.

Once you’ve overextended on scope, the standard vicious cycles in project models kick in:

  • errors beget errors
  • large models are slow and hard to test
  • errors mask other errors, making rework discovery more difficult
  • time pressure -> fatigue -> morale -> errors

So how do you get back on the righteous path? I think there are three options, but only 1.5 work.

The red path is tempting, because you can preserve the illusion of progress on scope. It will seldom work though. You’re unlikely to be able to inspect quality into an enlarged model later. You might progress to the right, but you won’t progress up. The orange path, a compromise of mild simplification and aggressive improvement, might work, but it’s going to hurt.

You’re better off to pursue the blue path, which essentially means reconstructing a better, simpler model, even at the expense of perceived functionality. Step 1: you’re in a hole, so stop digging. Productive things you might do include:

  • Suspend feature enhancements
  • Do extreme conditions tests – LOTS of them
  • Calibrate to data or run policy optimizations, as another kind of test (the algorithm will exploit weakenesses)
  • Dismantle sectors into standalone submodels that are easier to test and redesign
  • Aggressively clean up diagrams
  • Have a zero-tolerance policy for unit errors and runtime warnings
  • Document equations
  • Conduct team design reviews
  • Keep a trail of your model versions and components, so you can backtrack and later restore things you have to rip out

Once you’re back in shape, continue these disciplines. They’ll keep you on a path that stays far from the vortex.

What is accumulation?

The SD Society posted a definition of accumulation on Facebook, and it caught my eye.

This is from the SD Glossary, by David Ford.

accumulation (integration) : a gradual, non-instantaneous increase or decrease of a quantity over time. An accumulator is also referred to as a stock or level and represents the state of a system. To accumulate is the act of increasing and decreasing the size of a state variable (a stock) over time.

I wrote,

I’m not a fan of this definition. Accumulation is not necessarily gradual or non-instantaneous. In fact, it’s quite common to accumulate a flow pulse to produce an abrupt step in a stock. The key feature of accumulation is that it’s, well, cumulative. I’m at a loss for a way to express that without mentioning integration, which won’t help most people. Maybe someone can do better?

I think it’s telling that we don’t have ready words to describe accumulation. That might be a symptom, or a cause, of our problematic mental models about bathtub dynamics and bathtub statistics.

Resorting to “integration” isn’t really helpful, except to the mathematically inclined, which is not the audience for this kind of description I think.

The dictionary definition of “cumulative” turns out to be helpful:

increasing by successive additions

With that in mind, I’d propose something like:

  • accumulation : increasing by successive additions, or decreasing by successive subtractions.
  • stock (level) : A variable representing a persistent state in a system, which can be considered the memory of the system. Stocks change by accumulation of flows.
  • flow (rate): A variable that contributes to cumulative change in a stock over time. Flows represent activity or change in a system. A flow may represent the movement of physical quantities between stocks within a system boundary or across the model boundary and thereby into or out of the system (sinks and sources), or the rate of change of a nonphysical or intangible state.

Note that it’s hard to discuss accumulation without also discussing stocks and flows, so I’ve modified all three glossary entries.

What is SD?

Asmeret Naugle, Saeed Langarudi, Timothy Clancy propose to define System Dynamics in a new paper.

The defining characteristics are: (1) models are based on causal feedback structure, (2) accumulations and delays are foundational, (3) models are equation-based, (4) concept of time is continuous, and (5) analysis focuses on feedback dynamics.

I like the paper, but … not so fast. I think more, and more flexible, criteria are needed. I would use the term “characterize” rather than “define.” The purpose should be to aid recognition of SD, and hopefully good SD, without drawing too tight a box around the field.

I particularly disagree with the inclusion of continuous time. Even though discrete time stinks, I think continuous time is a common but inessential feature, like continuous flows. Many models include occasional discrete events, and sometimes they’re important. Ventity’s actions are explicit discrete events between time steps, and they may modify model structure in ways that are key to an operational representation of reality.

My top-of-mind alternative framework looks like:

I think it’s also helpful to describe things that are not SD:

  • Intertemporal optimization or rational expectations representing behavior
  • Computable general equilibrium
  • Linear regression
  • Linear programming
  • Mixed integer programming
  • Social Network Analysis (static)
  • Discrete ABM
  • Discrete event simulation
  • Equilibrium
  • Simultaneity

Sometimes it’s easier to see the negative space, but there are exceptions to these rules.

I think it’s notable that both frameworks exclude a variety of qualitative systems thinking approaches, like group model building or elicitation methods that create CLDs rather than simulatable models. I’m a big tent fan, and certainly some of the exceptions are common at the SD conference, but does that make them SD?

I think behavior is another challenging feature to describe. In my mind, System Dynamics is almost synonymous with behavioral dynamics. If you’re building an economic model in which agents explicitly know the future (e.g., via intertemporal optimization), it’s not an SD model (though you might be using it as a comparison case for some SD purpose). Yet there’s a strong tradition of prize-winning biomedical models that lack behavior because they lack human agency. These are not easily distinguishable from what other fields might call ODEs or nonlinear dynamics. I would not want to eject those from the field, but neither would I want this to become our focus.

I’ll be interested to see how the conversation evolves on this.