ChatGPT does the Climate Bathtub

Following up on our earlier foray into AI conversations about dynamics, I decided to follow up on ChatGPT’s understanding of bathtub dynamics. First I repeated our earlier question about climate:

This is close, but note that it’s suggesting that a decrease in emissions corresponds with a decrease in concentration. This is not necessarily true in general, due to the importance of emissions relative to removals. ChatGPT seems to recognize the issue, but fails to account for it completely in its answer. My parameter choice turned out to be a little unfortunate, because a 50% reduction in CO2 emissions is fairly close to the boundary between rising and falling CO2 concentrations in the future.

I asked again with a smaller reduction in emissions. This should have an unambiguous effect: emissions would remain above removals, so the CO2 concentration would continue to rise, but at a slower rate.

This time the answer is a little better, but it’s not clear whether “lead to a reduction in the concentration of CO2 in the atmosphere” means a reduction relative to what would have happened otherwise, or relative to today’s concentration. Interestingly, ChatGPT does get that the emissions reduction doesn’t reduce temperature directly; it just slows the rate of increase.

Modeling with ChatGPT

A couple weeks ago my wife started probing ChatGPT’s abilities. An early foray suggested that it didn’t entirely appreciate climate bathtub dynamics. She decided to start with a less controversial topic:

If there was a hole that went through the center of the moon, and I jumped in, how long would it take for me to come out the other side?

Initially, it’s spectacularly wrong. It gets the time-to-distance formula with linear acceleration right, but it has misapplied it. The answer is wrong by orders of magnitude, so it must be making a unit error or something. To us, the error is obvious. The moon is thousands of kilometers across, so how could you possibly traverse it in seconds, with only the moon’s tiny gravity to accelerate you?

At the end here, we ask for the moon’s diameter, because we started a race – I was building a Vensim model and my son was writing down the equations by hand, looking for a closed form solution and (when the integral looked ugly), repeating the calculation in Matlab. ChatGPT proved to be a very quick way to look up things like the diameter of the moon – faster even than googling up the Wikipedia page.

Since it was clear that non-constant acceleration was wrong, we tried to get it to correct. We hoped it would come up with F = m(me)*a = G*m(moon)*m(me)/R^2 and solve that.

Ahh … so the gigantic scale error is from assuming a generic 100-meter hole, rather than a hole all the way through to the other side. Also, 9.8 m/s^2 is Earth’s surface gravity.

Finally, it has arrived at the key concept needed to solve the problem: nonconstant acceleration, a = G*M(moon)/R^2 (where R varies with the jumper’s position in the hole).

Disappointingly, it crashed right at the crucial endpoint, but it’s already done most of the work to lay out the equations and collect the mass, radius and gravitational constant needed. It’s still stubbornly applying the constant acceleration formula at the end, but I must say that we were pretty impressed at this point.

In the same time, the Vensim model was nearly done, with a bit of assistance on the input numbers from Chat GPT. There were initially a few glitches, like forgetting to reverse the sign of the gravitational force at the center of the moon. But once it worked, it was easily extensible to variations in planet size, starting above or below the surface, etc. Puzzlingly the hand calculation was yielding a different answer (some kind of trivial hand computation error), but Matlab agreed with Vensim. Matlab was faster to code, but less interactive, and less safe because it didn’t permit checking units.

I’d hesitate to call this a success for the AI. It was a useful adjunct to a modeler who knew what they were doing. It was impressively fast at laying out the structure of the problem. But it was even faster at blurting out the wrong answer with an air of confidence. I would not want to fly in a plane designed by ChatGPT yet. To be fair, the system isn’t really designed to do physics, but a lot of reasoning about things like the economy or COVID requires some skills that it apparently doesn’t yet have.

Controlled Burn, Wood Stove, or Dumpster Fire?

The Twitter mess is a really interesting example of experimenting on a complex system in real time, apparently without much of a model.

I think fire is an interesting analogy (as long as you don’t take it too seriously, as with all analogies). There are different kinds of fires. A controlled burn improves forest health and lowers future risk by consuming dead wood. I think that’s what Musk is trying to accomplish. A fire in a wood stove makes nice heat, regulated by air flow. Controlled growth may be another Musk objective. An uncontrolled burn, or a burn where you don’t want it, is destructive.

I think the underlying parallel is that fire is driven by reinforcing feedback, and any organization has a lot of positive feedback loops. Success requires that the virtuous cycles are winning over the vicious cycles. If you get too many of the bad reinforcing feedbacks going, you have an uncontrolled burn in a bad place. This is often fatal, as at Sears.

Here are some of the loops I think are active at Twitter.

First, there’s the employee picture. I’ve divided them into two classes: over- and under-performing, which you might think of as identifying whether they produce more team value than their compensation indicates, or less. The dynamics I’ve drawn are somewhat incomplete, as I’ve focused on the “over-” side, omitting a number of parallel loops on the “under-” side for brevity.

There are some virtuous cycles you’d like to encourage (green). Hiring better people increases the perceived quality of colleagues, and makes it easier to recruit more good people. As you hire to increase work capacity, time pressure goes down, work quality goes up, you can book more work in the future, and use the revenue to hire more people. (This glosses over some features of IT work, like the fact that code is cumulative.)

There are also some loops you’d like to keep inactive, like the orange loop, which I’ve named for mass exodus, but might be thought of as amplifying departures due to the chaos and morale degradation from initial losses. A similar loop (not colored) is triggered when loss of high-performing employees increases the workload on the remainder, beyond their appetite.

I suspect that Musk is counting on mass layoffs (red) to selectively eliminate the underperforming employees, and perhaps whole functional areas. This might work, except that I doubt it’s achievable without side effects, either demoralizing good employees, or removing functions that actually made unobserved vital contributions. I think he’s also counting on promises of future performance to enlist high performers in a crusade. But mass layoffs work against that by destroying the credibility of promises about the future – why stick around if you may be terminated for essentially random reasons?

Another key feature (not shown) is Musk’s apparent propensity to fire people for daring to contradict him. This seems like a good way to selectively fire high performers, and destroy the morale of the rest. Once you ignite the vicious cycles in this way, it’s hard to recover, because unlike forest detritus, the overperformers are more “flammable” than the underperformers – i.e., they have better prospects at other companies. Having the good people leave first is the opposite of what you want.

How far this fire spreads depends on how it impacts customers. The initial mass layoffs and reinforcing departures seem to have had a big impact on moderation capacity. That triggers a couple more vicious cycles. With moderation capacity down, bad actors last longer on the platform, increasing moderation workload. Higher workload and lower capacity decreases quality of moderation, so the removal of bad accounts falls more (red). As this happens, other potential bad actors observe the opportunity and move into the breach (orange).

There are some aspects of this subsystem that I found difficult to deal with on a CLD. The primary questions are of “good and bad from whose perspective,” and whether greater intentional permissiveness offsets diminished moderation capacity. I think there are some legitimate arguments for permitting more latitude (“sunshine is the best remedy”) but also compelling arguments for continued proscription of certain behavior (violence for example). The credibility of policy changes so far, such as they can be determined, is undermined by the irony of the immediate crackdown on freedom to criticize the boss.

One key feature not shown here is how advertisers view all this. They’re the revenue driver after all. So far they seem to fear the increase in turbulence and controversy, even if it brings diversity and engagement.That’s bad, because it’s another vicious cycle (roughly, less revenue -> less capacity -> more conflict -> less revenue).

Account holders might become more of a revenue driver, but the initial rollout of the $8 verification idea was badly botched – presumably in part because of the simultaneous mass reduction in organizational capacity. This is unfortunate, because reducing anonymity might be a good way of promoting quality information through accountability.

The alternative, if Twitter collapses, is not entirely appetizing. Other big platforms aren’t exactly paragons of freedom or civility, and alternatives like Mastodon that provide more self-moderation capacity probably also enhance the insularity of filter bubbles.

I’m wondering again, (how) Should Systems Thinkers be on Social Media?

Related:

AI is killing us now

The danger of path-dependent information flows on the web

Encouraging Moderation

 

How many “thinkings” are there?

In my recent Data & Uncertainty talk, one slide augmented Barry Richmond’s list of 7 critical modes of thinking:

The four new items are Statistical, Solution, Behavioral, and Complexity thinking. The focus on solutions and behavioral decision making has been around for a long time in SD (and BDM is really part of Barry’s Operational Thinking).

On the other hand, statistical and complexity elements are not particularly widespread in SD. Certainly elements of both have been around from the beginning, but others – like explicit treatment of measurement errors, process noise and Bayesian SD (statistical) and spatial, agent and network dynamics (complexity) are new. Both perhaps deserve some expansion into multiple concepts, but it would be neat to have a compact list of the most essential thinking modes across disciplines. What’s your list?

Postdoc @ UofM in SD for Wildlife Management


This is an interesting opportunity. The topic is important, it’s a hard problem, and it’s interesting both on the techy side and the people/process side. You can get a little flavor of recent CWD work here. The team is smart & nice, and supports competent and ethical resource managers on the ground. Best of all, it’s in Montana, though you do have to be a Griz instead of a Cat.

That QR code (and this link) points to the full job listing.

Can AI do SD?

There are lots of interesting developments in machine learning for time series modeling, but can an AI write about systems sensibly? I decided to try a few free online AI content generators.

Using system dynamics modeling for project delay and disruption

1. System Dynamics Modeling (SDM)

System dynamics modeling is a mathematical model that helps us understand how systems work and how they interact with each other. SDM is useful for understanding complex systems and their behavior over time. In addition, it can help predict future outcomes based on current conditions.

2. Project Delay & Disruption

Project delays and disruptions occur frequently in projects. These events cause significant problems for organizations and individuals alike. A project delay occurs when a project is delayed past its original completion date. A project disruption occurs when a project is disrupted before it reaches its original completion date. Both types of project delays and disruptions can have negative effects on the success of a project.

3. Causes of Project Delays & Disruptions

There are many causes of project delays and disruptions. Some of these causes include:

• Poor planning

• Unclear requirements

• Lack of communication

• Poor management

• Miscommunication

• Poor leadership

• Poor team coordination

• Poor resource allocation

Well, we didn’t get the rework cycle or any feedback effects, but not an entirely useless effort.

Do climate tipping points signal the onset of Limits to Growth?

Climate change is a serious human problem that causes concern among many people. Some believe that climate change is caused by human activity and that it will bring about a catastrophic future. However, others believe that it is natural and that there are ways to prevent it from causing further damage. Regardless of whether climate change is caused by human activity or not, it has the potential to cause major damage to the planet. Many believe that we are approaching climate tipping points, which could have catastrophic consequences for humanity.

One reason why climate tipping points signal the onset of limits to growth is because they are a harbinger of future environmental disasters. These disasters would occur because of the way we have altered the climate and what we have done to the planet. For example, the Larsen B ice shelf in Antarctica broke away in 2002 and disintegrated within four years due to global warming. Climate tipping points signal the onset of limits to growth because they are a harbinger of future environmental disasters. Governments and environmental groups should prepare for future environmental disasters by investing in renewable resources and making changes in agriculture to reduce food insecurity.

Another reason why climate tipping points signal the onset of limits to growth is because they are a harbinger of future economic problems. Scientists have warned us about manmade climate change for years; however, most people ignore their warnings and keep on using fossil fuels anyway. This is causing major problems for our planet, including droughts, rising sea levels and more frequent natural disasters. Unfortunately, many people are paying little attention to the warnings; they think they can ignore them and avoid problems later. Governments and environmental groups should prepare for future economic problems by investing in renewable resources and making changes in agriculture to reduce food insecurity.

I have to say, the AI is making more sense than the South Dakota legislature.

Should hunters try to harvest more deer to help slow the growth of chronic wasting disease? Or is the cure worse than the disease, so we are merely killing deer now to avoid them dying later?

This is a case of a deer that had been found dead by an early hunter, then found by another hunter on the same day. These hunters were in the woods for several days, hunting deer. Then, for a week or more, the deer had eaten enough grass to cover itself, leaving the deer lying in its own feces. In the United States, this is called wasting. Chronic wasting is an infectious disease that, in humans, is caused by bacteria. When a person is infected with chronic wasting, they begin losing weight and become weak. They begin to smell bad, have sores on their body, lose their appetite, feel tired, listless and weak, having trouble standing up. Some people lose consciousness. Their physical…

I guess we still need modelers for some things.

Believing Exponential Growth

Verghese: You were prescient about the shape of the BA.5 variant and how that might look a couple of months before we saw it. What does your crystal ball show of what we can expect in the United Kingdom and the United States in terms of variants that have not yet emerged?

Pagel: The other thing that strikes me is that people still haven’t understood exponential growth 2.5 years in. With the BA.5 or BA.3 before it, or the first Omicron before that, people say, oh, how did you know? Well, it was doubling every week, and I projected forward. Then in 8 weeks, it’s dominant.

It’s not that hard. It’s just that people don’t believe it. Somehow people think, oh, well, it can’t happen. But what exactly is going to stop it? You have to have a mechanism to stop exponential growth at the moment when enough people have immunity. The moment doesn’t last very long, and then you get these repeated waves.

You have to have a mechanism that will stop it evolving, and I don’t see that. We’re not doing anything different to what we were doing a year ago or 6 months ago. So yes, it’s still evolving. There are still new variants shooting up all the time.

At the moment, none of these look devastating; we probably have at least 6 weeks’ breathing space. But another variant will come because I can’t see that we’re doing anything to stop it.

Medscape, We Are Failing to Use What We’ve Learned About COVID, Eric J. Topol, MD; Abraham Verghese, MD; Christina Pagel, PhD

Reading Between the Lines on Forrester’s Perspective on Data

I like Jay Forrester’s “Next 50 Years” reflection, except for his perspective on data:

I believe that fitting curves to past system data can be misleading.

OK, I’ll grant that fitting “curves” – as in simple regressions – may be a waste of time, but that’s a bit of a strawdog. The interesting questions are about fitting good dynamic models that pass all the usual structural tests as well as fitting data.

Also, the mere act of fitting a simple model doesn’t mislead; the mistake is believing the model. Simple fits can be extremely useful for exploratory analysis, even if you later discard the theories they imply.

Having a model give results that fit past data curves may impress a client.

True, though perhaps this is not the client you’d hope to have.

However, given a model with enough parameters to manipulate, one can cause any model to trace a set of past data curves.

This is Von Neumann’s elephant. He’s right, but I roll my eyes every time I hear this repeated – it’s a true but useless statement, like all models are wrong. Nonlinear dynamic models that pass SD quality checks usually don’t have anywhere near the degrees of freedom needed to reproduce arbitrary behaviors.

Doing so does not give greater assurance that the model contains the structure that is causing behavior in the real system.

On the other hand, if the model can’t fit the data, why would you think it does contain the structure that is causing the behavior in the real system?

Furthermore, the particular curves of past history are only a special case. The historical curves show how the system responded to one particular combination of random events impinging on the system. If the real system could be rerun, but with a different random environment, the data curves would be different even though the system under study and its essential dynamic character are the same.

This is certainly true. However, the problem is that the particular curve of history is the only one we have access to. Every other description of behavior we might use to test the model is intuitively stylized – and we all know how reliable intuition in complex systems can be, right?

Exactly matching a historical time series is a weak indicator of model usefulness.

Definitely.

One must be alert to the possibility that adjusting model parameters to force a fit to history may push those parameters outside of plausible values as judged by other available information.

This problem is easily managed by assigning strong priors to known parameters in the model calibration process.

Historical data is valuable in showing the characteristic behavior of the real system and a modeler should aspire to have a model that shows the same kind of behavior. For example, business cycle studies reveal a large amount of information about the average lead and lag relationships among variables. A business-cycle model should show similar average relative timing. We should not want the model to exactly recreate a sample of history but rather that it exhibit the kinds of behavior being experienced in the real system.

As above, how do we know what kinds of behavior are being experienced, if we only have access to one particular history? I think this comment implies the existence of intuitive data from other exemplars of the same system. If that’s true, perhaps we should codify those as reference modes and treat them like data.

Again, yielding to what the client wants may be the easy road, but it will undermine the powerful contributions that system dynamics can make.

This is true in so many ways. The client often wants too much detail, or too many scenarios, or too many exogenous influences. Any of these can obstruct learning, or break the budget.

These pages are full of data-free conceptual models that I think are valuable. But I also love data, so I have a different bottom line:

  • Data and calibration by themselves can’t make the model worse – you’re adding additional information to the testing process, which is good.
  • However, time devoted to data and calibration has an opportunity cost, which can be very high. So, you have to weigh time spent on the data against time spent on communication, theory development, robustness testing, scenario exploration, sensitivity analysis, etc.
  • That time spent on data is not all wasted, because it’s a good excuse to talk to people about the system, may reveal features that no one suspected, and can contribute to storytelling about the solution later.
  • Data is also a useful complement to talking to people about the system. Managers say they’re doing X. Are they really doing Y? Such cases may be revealed by structural problems, but calibration gives you a sharper lens for detecting them.
  • If the model doesn’t fit the data, it might be the data that is wrong or misinterpreted, and this may be an important insight about a measurement system that’s driving the system in the wrong direction.
  • If you can’t reproduce history, you have some explaining to do. You may be able to convince yourself that the model behavior replicates the essence of the problem, superimposed on some useless noise that you’d rather not reproduce. Can you convince others of this?