Bad data, bad models

Baseline Scenario has a nice post on bad data:

To make a vast generalization, we live in a society where quantitative data are becoming more and more important. Some of this is because of the vast increase in the availability of data, which is itself largely due to computers. Some is because of the vast increase in the capacity to process data, which is also largely due to computers. …

But this comes with a problem. The problem is that we do not currently collect and scrub good enough data to support this recent fascination with numbers, and on top of that our brains are not wired to understand data. And if you have a lot riding on bad data that is poorly understood, then people will distort the data or find other ways to game the system to their advantage.

In spite of ubiquitous enterprise computing, bad data is the norm in my experience with corporate consulting. At one company, I had access to very extensive data on product pricing, promotion, advertising, placement, etc., but the information system archived everything inaccessibly on a rolling 3-year horizon. That made it impossible to see long term dynamics of brand equity, which was really the most fundamental driver of the firm’s success. Our experience with large projects includes instances where managers don’t want to know the true state of the system, and therefore refuse to collect or provide needed data – even when billions are at stake. And some firms jealously guard data within stovepipes – it’s hard to optimize the system when the finance group keeps the true product revenue stream secret in order to retain leverage over the marketing group.

People worry about garbage-in-garbage out, but modeling can actually be the antidote to bad data. If you pay attention to quality, the process of building a model will reveal all kinds of gaps in data. We recently discovered that various sources of vehicle fleet data are in serious disagreement, because of double-counting of transactions and interstate sales, and undercounting of inspections. Once data issues are known, a model can be used to remove biases and filter noise (your GPS probably runs a Kalman Filter to combine a simple physical model of your trajectory with noisy satellite measurements).

Not just any model will do; causal models are important. It’s hard to discover that your data fails to observe physical laws or other reality checks with a model that permits negative cows and buries the acceleration of gravity in a regression coefficient.

The problem is, a lot of people have developed an immune response against models, because there are so many that don’t pay attention to quality and serve primarily propagandistic purposes. The only antidote for that, I think, is to teach modeling skills, or at least model consumption skills, so that they know the right questions to ask in order to separate the babies from the bathwater.

What do SD bibliography entries say about the health of the field?

Here’s a time series of the number of entries in the system dynamics bibliography:

SD bibliography entries

The peak was in 2000 with 420 entries. If you break out the types, it looks like the conference has saturated at about 250-300 papers, while journal, report and book publications have fallen off.

SD biblio detailI suspect that some of the decline is explained by a long reporting lag, and some is “defection” of SD work into journals that aren’t captured in the bibliography (probably a good thing). It would be interesting to see a corrected series, to see what it says about the health of the field. The ideal way to do the correction would be to build a simple dynamic model of actual and measured publication rates, estimating the parameters from data (student project, anyone?).

How Many Pairs of Rabbits Are Created by One Pair in One Year?

The Fibonacci numbers are often illustrated geometrically, with spirals or square tilings, but the nautilus is not their origin. I recently learned that the sequence was first reported as the solution to a dynamic modeling thought experiment, posed by Leonardo Pisano (Fibonacci) in his 1202 masterpiece, Liber Abaci.

How Many Pairs of Rabbits Are Created by One Pair in One Year?

A certain man had one pair of rabbits together in a certain enclosed place, and one wishes to know how many are created from the pair in one year when it is the nature of them in a single month to bear another pair, and in the second month those born to bear also. Because the abovewritten pair in the first month bore, you will double it; there will be two pairs in one month. One of these, namely the first, bears in the second month, and thus there are in the second month 3 pairs; of these in one month two are pregnant, and in the third month 2 pairs of rabbits are born, and thus there are 5 pairs in the month; in this month 3 pairs are pregnant, and in the fourth month there are 8 pairs, of which 5 pairs bear another 5 pairs; these are added to the 8 pairs making 13 pairs in the fifth month; these 5 pairs that are born in this month do not mate in this month, but another 8 pairs are pregnant, and thus there are in the sixth month 21 pairs; [p284] to these are added the 13 pairs that are born in the seventh month; there will be 34 pairs in this month; to this are added the 21 pairs that are born in the eighth month; there will be 55 pairs in this month; to these are added the 34 pairs that are born in the ninth month; there will be 89 pairs in this month; to these are added again the 55 pairs that are born in the tenth month; there will be 144 pairs in this month; to these are added again the 89 pairs that are born in the eleventh month; there will be 233 pairs in this month.

Source: http://www.math.utah.edu/~beebe/software/java/fibonacci/liber-abaci.html

The solution is the famous Fibonacci sequence, which can be written as a recurrent series,

F(n) = F(n-1)+F(n-2), F(0)=F(1)=1

This can be directly implemented as a discrete time Vensim model:

Fibonacci SeriesHowever, that representation is a little too abstract to immediately reveal the connection to rabbits. Instead, I prefer to revert to Fibonacci’s problem description to construct an operational representation:

Fibonacci Rabbits

Mature rabbit pairs are held in a stock (Fibonacci’s “certain enclosed space”), and they breed a new pair each month (i.e. the Reproduction Rate = 1/month). Modeling male-female pairs rather than individual rabbits neatly sidesteps concern over the gender mix. Importantly, there’s a one-month delay between birth and breeding (“in the second month those born to bear also”). That delay is captured by the Immature Pairs stock. Rabbits live forever in this thought experiment, so there’s no outflow from mature pairs.

You can see the relationship between the series and the stock-flow structure if you write down the discrete time representation of the model, ignoring units and assuming that the TIME STEP = Reproduction Rate = Maturation Time = 1:

Mature Pairs(t) = Mature Pairs(t-1) + Maturing
Immature Pairs(t) = Immature Pairs(t-1) + Reproducing - Maturing

Substituting Maturing = Immature Pairs and Reproducing = Mature Pairs,

Mature Pairs(t) = Mature Pairs(t-1) + Immature Pairs(t-1)
Immature Pairs(t) = Immature Pairs(t-1) + Mature Pairs(t-1) - Immature Pairs(t-1) = Mature Pairs(t-1)

So:

Mature Pairs(t) = Mature Pairs(t-1) + Mature Pairs(t-2)

The resulting model has two feedback loops: a minor negative loop governing the Maturing of Immature Pairs, and a positive loop of rabbits Reproducing. The rabbit population tends to explode, due to the positive loop:

Fibonacci Growth

In four years, there are about as many rabbits as there are humans on earth, so that “certain enclosed space” better be big. After an initial transient, the growth rate quickly settles down:

Fibonacci Growth RateIts steady-state value is .61803… (61.8%/month), which is the Golden Ratio conjugate. If you change the variable names, you can see the relationship to the tiling interpretation and the Golden Ratio:

Fibonacci Part Whole

Like anything that grows exponentially, the Fibonacci numbers get big fast. The hundredth is  354,224,848,179,261,915,075.

As before, we can play the eigenvector trick to suppress the growth mode. The system is described by the matrix:

-1 1
 1 0

which has eigenvalues {-1.618033988749895, 0.6180339887498949} – notice the appearance of the Golden Ratio. If we initialize the model with the eigenvector of the negative eigenvalue, {-0.8506508083520399, 0.5257311121191336}, we can get the bunny population under control, at least until numerical noise excites the growth mode, near time 25:

Fibonacci Stable

The problem is that we need negarabbits to do it, -.850653 immature rabbits initially, so this is not a physically realizable solution (which probably guarantees that it will soon be introduced in legislation).

I brought this up with my kids, and they immediately went to the physics of the problem: “Rabbits don’t live forever. How big is your cage? Do you have rabbit food? TONS of rabbit food? What if you have all males, or varying mixtures of males and females?”

It’s easy to generalize the structure to generate other sequences. For example, assuming that mature rabbits live for only two months yields the Padovan sequence. Its equivalent of the Golden Ratio is 1.3247…, i.e. the rabbit population grows more slowly at ~32%/month, as you’d expect since rabbit lives are shorter.

The model’s in my library.

The simple dynamics of violence

There’s simple, as in Occam’s Razor, and there’s simple, as in village idiot.

There’s a noble tradition in economics of using simple thought experiments to illuminate important dynamics. Sometimes things go wrong, though, like this (from a blog I usually like):

… suppose that you have the choice of providing gruesome rhetoric that will increase the probability of a killing spree but will also increase the probability of the passage of Universal Health Insurance. Suppose using the Arizona case as a baseline we say that the average killing spree causes the death of 6 people. Then if your rhetoric is at least 6/22,000 = 1/3667 times as likely to produce a the passage of universal health insurance as it is to induce a killing spree then you saved lives by engaging in fiery rhetoric.

http://modeledbehavior.com/2011/01/11/the-optimal-quantity-of-violent-rhetoric/

Here’s the apparent mental model behind this reasoning:

Linear ViolenceIt’s linear: use violent rhetoric, get the job done. There are two problems with this simple model. First, the sign of the relationships is ambiguous. I tend to suspect that anyone who needs to use violent rhetoric is probably a fanatic, who shouldn’t be making policy in the first place. Setting that aside, the bigger problem is that violence isn’t linear. Like potato chips, you can never have just one excessive outburst. Violent rhetoric escalates, and sometimes crosses into real violence. This is the classic escalation archetype:

Violence EscalationIn the escalation archetype, two sides struggle to maintain an advantage over each other. This creates two inner negative feedback loops, which together create a positive feedback loop (a figure-8 around the two negative loops). It’s interesting to note that, so far, the use of violent rhetoric is fairly one-sided – the escalation is happening within the political right (candidates vying for attention?) more than between left and right.

There are many other positive feedbacks involved in the process, which exacerbate the direct escalation of language. Here are some speculative examples:

Violence Other LoopsThe positive feedbacks around violent rhetoric create a societal trap, from which it may be difficult to extricate ourselves. If there’s a general systems insight about vicious cycles, it’s that the best policy is prevention – just don’t start down that road (if you doubt this, play the dollar auction or smoke some crack). Politicians who engage in violent rhetoric, or other races to the bottom of the intellectual barrel, risk starting a very destructive spiral:

violence Social

The bad news is that there’s no easy remedy for this behavior. Purveyors of violent rhetoric and their supporters need to self-reflect on the harm they do to society. The good news is that if public support for violent words and images reverses, the positive loops will help to repair the damage, and take us closer to a model of rational discourse for problem solving.

About that, there is at least a bit of wisdom in the article:

… if you genuinely care about the shooting death of six people then you ought to really, really care about endorsing wrong public policies which will result in the premature death of vastly more people. Hence you should devote yourself to actually discovering the right answers to these questions, rather than than coming up with ad hoc rhetoric – violent or polite – in support of the policy you happend to have been attracted to first.

Deeper Lessons

From the mailbag, regarding my last post on storytelling and playing with systems,

I read your blog post from the 19th and wondered how you would compare what was presented in the blog in contrast with what Forrester said on on pg 17, “Deeper Lessons” in the paper at

http://sysdyn.clexchange.org/sdep/papers/D-4434-3.pdf

That paper is Jay Forrester’s 1994 Learning Through System Dynamics as Preparation for the 21st Century. There’s a lot of good thinking in it. Unfortunately, the pdf is protected, so I have to give you a screenshot:

Forrester 4434 excerpt

The “important implications” that might be missed are things like, “we cause our own problems,” the notion that cause and effect are separated in time and space, and the differences between high- and low-leverage policies. (Go read the original for more.)

I see the blog and paper as complementary. Forrester’s deeper learnings are things that emerge from understanding the way things work, and that understanding – he argues – is developed through experimentation. This is also the rationale for management flight simulators and other games that teach systems principles. I think the guidance toward important implications that Forrester advocates is not much different than the kind of reporting the blog seeks – coverage that illuminates system structure and its consequences.

I don’t think stories per se are the problem. Sometimes they do degenerate into the equivalent of a bad history textbook – a litany of he-said-she-said opinions and events without any organizing structure. However,  a story can be crafted to reveal the way things work, and systems thinkers often advocate the use of stories to present system insights. Perhaps we should be more cautious about that.

I think it’s very natural to drop from an operational description of a system to stories that are so much about people and events that they lose track of structure. For example, the article on the steam engine at howstuffworks, which ought to be structural if anything is, starts off with, “They were first invented by Thomas Newcomen in 1705, and James Watt (who we remember each time we talk about “60-watt light bulbs” and the such) made big improvements to steam engines in 1769.” If it’s hard for steam engines, which are well-understood, imagine how hard it is for a reporter to get beyond the words of a controversial topic like health care, where even experts are likely to ambiguous and conflicting mental models.

The cautionary aspect of stories reminded me of a section in The Fifth Discipline, about what happens when you don’t convey systemic understanding:

Unfortunately, much more common are leaders who have a sense of purpose and genuine vision, but little ability to foster systemic understanding. Many greate “charismatic” leaders, despite having a deep sense of purpose and vision, manage almost exclusively at the level of events. Such leaders deal in visions and crises, and little in between. They foster a lofty sense of purpose and mission. They create tremendous energy and enthusiasm. But, under their leadership, an organization caroms from crisis to crisis. Eventually, the worldview of people in the organization becomes dominated by events and reactiveness. People experience being jerked continually from one crisis to another; they have no control over their time, let alone their destiny. Eventually, this will breed deep cynicism about the vision, and about visions in general. The soil within which a vision must take root – the belief that we can influence our future – becomes poisoned.

Such “visionary crisis managers” often become tragic figures. Their tragedy stems from the depth and genuineness of their vision. They often are truly committed to noble aspirations. But noble aspirations are not enough to overcome systemic forces contrary to the vision. As the ecologists say, “Nature bats last.” Systemic forces will win out over the most noble vision if we do not learn how to recognize, work with, and gently mold those forces.

Subversive modeling

This is a technical post, so Julian Assange fans can tune out. I’m actually writing about source code management for Vensim models.

Occasionally procurements try to treat model development like software development. That can end in tragedy, because modeling isn’t the same as coding. However, there are many common attributes, and therefore software tools can be useful for modeling.

One typical challenge in model development is version control. Model development is iterative, and I typically go through fifty or more named model versions in the course of a project. C-ROADS is at v142 of its second life. It takes discipline to keep track of all those model iterations, especially if you’d like to be able to document changes along the way and recover old versions. Having a distributed team adds to the challenge.

The old school way

Continue reading “Subversive modeling”

Storytelling and playing with systems

This journalist gets it:

Maybe journalists shouldn’t tell stories so much. Stories can be a great way of transmitting understanding about things that have happened. The trouble is that they are actually a very bad way of transmitting understanding about how things work. Many of the most important things people need to know about aren’t stories at all.

Our work as journalists involves crafting rewarding media experiences that people want to engage with. That’s what we do. For a story, that means settings, characters, a beginning, a muddle and an end. That’s what makes a good story.

But many things, like global climate change, aren’t stories. They’re issues that can manifest as stories in specific cases.

… the way that stories transmit understanding is only one way of doing so. When it comes to something else – a really big, national or world-spanning issue, often it’s not what happened that matters, so much as how things work.

…When it comes to understanding a system, though, the best way is to interact with it.

Play is a powerful way of learning. Of course the systems I’ve listed above are so big that people can’t play with them in reality. But as journalists we can create models that are accurate and instructive as ways of interactively transmitting understanding.

I use the word ‘play’ in its loosest sense here; one can ‘play’ with a model of a system the same way a mechanic ‘plays’ around with an engine when she’s not quite sure what might be wrong with it.

The act of interacting with a system – poking and prodding, and finding out how the system reacts to your changes – exposes system dynamics in a way nothing else can.

If this grabs you at all, take a look at the original – it includes some nice graphics and an interesting application to class in the UK. The endpoint of the forthcoming class experiment is something like a data visualization tool. It would be cool if they didn’t stop there, but actually created a way for people to explore the implications of different models accounting for the dynamics of class, as Climate Colab and Climate Interactive do with climate models.