Climate Policy Effectiveness, Pricing and Causality

A new paper by Stechemesser et al. in Science evaluates the large suite of climate policies in

Editor’s summary

It is easy for countries to say they will reduce their emissions of greenhouse gases, but these statements do not mean that the policies they adopt will be effective. Stechemesser et al. evaluated 1500 climate policies that have been implemented over the past 25 years and identified the 63 most successful ones. Some of those successes involved rarely studied policies and unappreciated combinations. This work illustrates the kinds of policy efforts that are needed to close the emissions gaps in various economic sectors. —Jesse Smith

Abstract

Meeting the Paris Agreement’s climate targets necessitates better knowledge about which climate policies work in reducing emissions at the necessary scale. We provide a global, systematic ex post evaluation to identify policy combinations that have led to large emission reductions out of 1500 climate policies implemented between 1998 and 2022 across 41 countries from six continents. Our approach integrates a comprehensive climate policy database with a machine learning–based extension of the common difference-in-differences approach. We identified 63 successful policy interventions with total emission reductions between 0.6 billion and 1.8 billion metric tonnes CO2. Our insights on effective but rarely studied policy combinations highlight the important role of price-based instruments in well-designed policy mixes and the policy efforts necessary for closing the emissions gap.

Effective policies from Stechemesser et al.

Emil Dimanchev has a useful critique on the platform formerly known as Twitter. I think there are two key points. First, the method may have a hard time capturing gradual changes. This is basically a bathtub statistics problem: policies are implemented as step changes, but the effect may involve one or more integrations (from implementation lags, slow capital turnover, R&D pipeline delays, etc.). The structural problem is probably exacerbated by a high level of noise. The bottom line is that some real effects may not be readily detectable.

ED’s second key point is essentially a variant of “correlation is not causation”:

To understand effectiveness of a policy (or of a policy mix), we are interested in the probability (P) that it reduces emissions (our hypothesis, H) when implemented (our condition, E). In statistics, we denote that P(H|E). But the authors do something very different.

Instead, the authors take all cases (arbitrarily) defined as effective (E) and then estimate how often a policy was implemented around that time. That’s P(E|H). The two shouldn’t be conflated (though Tversky and Kahneman showed people often make that mistake).

An example of this conflation is when people conclude from the paper that a policy mix needs CO2 pricing to be effective. But the data merely show that in most of their historical emission breaks, pricing was part of the policy mix, a statement of P(E|H).

Concluding from the paper that pricing increases the probability of a break in emissions, or P(H|E), is exactly like saying that you should play a musical instrument to increase your chances of winning a Nobel prize because most Nobel laureates play a musical instrument.

I have mixed feelings about this argument. It’s correct in principle, but I think it’s incomplete, because there are strong mechanistic arguments for the effectiveness of some climate policies, that should complement the probabilistic reasoning here. I think getting to this mechanistic view is actually what ED drives at in his prescription:

An empirical investigation of effectiveness would involve looking at all cases an instrument or a mix was implemented, and estimating the CO2 reductions it caused while controlling for all confounding variables. That’s obviously very hard. Again, that’s not what the paper does.

To me, this means use an a priori energy model to detect policy effects, rather than an abstract ML method. When the IEA climate policy database first came out, I thought that would be a really cool project. Then I thought about how much work it would be, so that will have to wait for another time. But given the abundance of mechanistic arguments favoring some policies, I can’t help leaning towards the view that “correlation is not causation – but it’s a good start”. Here’s a suggestive view of the results, stratifying the price and non-price initiatives, with the pies roughly sized by absolute numbers of policies:

Lets suppose, in the worst case, that the effects observed in Stechemesser et al. are simply random luck. Why should the non-price policies be associated with effects so much less frequently? The paper may be weak evidence, but I think it’s not inconsistent with the idea – backed by models – that a lot of climate and energy  policies are seriously flawed (CAFE standards and Energy Star come to mind) or simply fluffy feel-good greenwash.

Just Say No to Complex Equations

Found in an old version of a project model:

IF THEN ELSE( First Time Work Flow[i,Proj,stage
] * TIME STEP >= ( Perceived First Time Scope UEC Work
[i,Proj,uec] + Unstarted Work[i,Proj,stage] )
:OR: Task Is Active[i,Proj,Workstage] = 0
:OR: avg density of OOS work[i,Proj,stage] > OOS density threshold,
Completed work still out of sequence[i,Proj,stage] / TIME STEP
+ new work out of sequence[i,Proj,stage] ,
MIN( Completed work still out of sequence[i,Proj,stage] / Minimum Time to Retrofit Prerequisites into OOS Work
+ new work out of sequence[i,Proj,stage],
new work in sequence[i,Proj,stage]
* ZIDZ( avg density of OOS work[i,Proj,stage],
1 – avg density of OOS work[i,Proj,stage] ) ) )

An equation like this needs to be broken into at least 3 or 4 human-readable chunks. In reviewing papers for the SD conference, I see similar constructions more often than I’d like.

ChatGPT plays the Beer Game

Well, sort of. I decided to use the Beer Game inventory management framework as a test of reasoning in a dynamic situation. I changed “beer” to “televisions” for easy typing, and to avoid triggering any safety provisions on alcohol, but otherwise I tried to stay close to the classic game.

You

I run a retail store. Most weeks, we sell 4 televisions. We typically have 8 televisions in inventory. Each week, we order more televisions to restock the ones sold and maintain our inventory. It takes 2 weeks to receive a television we order. It costs $1 to keep a television in stock for a week, but the opportunity cost of lost sales if no televisions are in stock is $2. This week we sold 8 televisions. How many should we order this week?

ChatGPT

To determine how many televisions you should order this week, let’s break down the process step by step:

1. *Current Inventory:* You have 8 televisions in inventory.

2. *Demand:* This week, you sold 8 televisions.

3. *Lead Time:* It takes 2 weeks to receive a television after ordering.

4. *Costs:*

* Holding Cost: $1 per television per week.
* Stockout Cost: $2 per television per week.

5. *Objective:* Minimize the total cost, considering both holding and stockout costs.

Given that it takes 2 weeks for new stock to arrive, you need to cover the demand for the next 2 weeks.

Let’s calculate:

* *Demand for the next 2 weeks:* 8 televisions per week * 2 weeks = 16 televisions

Now, let’s calculate the total cost for different ordering quantities:

*Scenario 1:* Ordering 16 televisions

o Holding cost for 2 weeks: $1 * 16 televisions * 2 weeks = $32
o No stockout cost
o Total Cost: $32

*Scenario 2:* Ordering 17 televisions

o Holding cost for 2 weeks: $1 * 17 televisions * 2 weeks = $34
o Stockout cost for 1 television for 1 week: $2 * 1 television * 1 week = $2
o Total Cost: $34 + $2 = $36

*Scenario 3:* Ordering 18 televisions

o Holding cost for 2 weeks: $1 * 18 televisions * 2 weeks = $36
o Stockout cost for 2 televisions for 1 week: $2 * 2 televisions * 1 week = $4
o Total Cost: $36 + $4 = $40

Based on these calculations, ordering 16 televisions would minimize your total cost. Therefore, you should order 16 televisions this week.

Verdict: The problem statement is OK, but GPT seems to have taken this as a one-shot static situation. It appears to be neglecting the possibility of reordering in following weeks, and therefore trying to cover the entire shipping pipeline with the order. Also, it’s misunderstood the stockout cost. In any case, it’s overreacting.

You

16 televisions sounds like a lot. What if I just replace the 8 sold by ordering 8 televisions? Or should I order more than 8 in order to replace depleted inventory?

ChatGPT

Ordering just 8 televisions to replace the ones sold might not be the best strategy because it doesn’t account for potential stockouts and the time it takes for new stock to arrive.

Let’s analyze the scenario if you only order 8 televisions:

1. *Demand:* You sold 8 televisions this week.
2. *Lead Time:* It takes 2 weeks to receive new televisions after ordering.
3. *Inventory Depletion:* If you only order 8 televisions, you won’t have enough stock to cover demand for the next 2 weeks.
4. *Potential Stockouts:* There’s a risk of running out of stock, leading to lost sales and stockout costs.

Given these factors, if you only order 8 televisions, you’ll likely face stockouts and incur additional costs. It’s better to order enough to cover the demand for the next 2 weeks and maintain a buffer to minimize the risk of stockouts.

Ordering 16 televisions allows you to cover the demand for the next 2 weeks and maintain your inventory level, considering the lead time for new stock to arrive. It helps minimize the total cost by balancing holding costs and potential stockout costs.

Therefore, ordering 16 televisions would be a more prudent approach to ensure you meet customer demand while minimizing costs in the long run.

Verdict: I’m not quite sure what it’s “thinking” here. For one thing, what I order this week will have no effect on stock for the next two weeks, due to the shipping delay. So if I’m going to be out of stock next week, that’s inevitable. I’ll try to help it out …

You

I forgot to mention that previously we have been ordering 4 televisions per week to replace those sold. That means there are already 8 televisions being shipped to us – 4 arriving next week, and 4 the week after. Does that change the answer?

ChatGPT

Thanks for the clarification. Considering that you have 4 televisions already on order for each of the next 2 weeks, that changes the situation. Let’s adjust the analysis:

1. *Current Inventory:* You have 8 televisions in inventory.
2. *Demand:* You sold 8 televisions this week.
3. *Lead Time:* You have 4 televisions arriving next week and another 4 arriving the week after.
4. *Total Demand to Cover:* 8 televisions sold this week + 8 televisions to cover the next 2 weeks = 16 televisions.
5. *Costs:* Holding cost is $1 per television per week, and the stockout cost is $2 per television per week.

With 16 televisions to cover the next 2 weeks’ demand, you have a couple of options:

*Ordering 8 televisions:* This would cover the immediate demand for the next week, but you would still have to cover the following week’s demand with the existing order. However, you risk stockouts for the second week’s demand.

*Ordering 16 televisions:* This would cover both weeks’ demand without any risk of stockouts. You would not need to hold excess inventory beyond what’s necessary for the next 2 weeks.

Given the potential stockout costs and the need to maintain inventory to cover future demand, ordering 16 televisions seems like the safer option. It ensures that you can meet customer demand for the next 2 weeks without incurring additional costs due to stockouts.

Verdict: I think it still doesn’t get the sequencing of orders and shipments, probably because it’s not really managing with an understanding of bathtub dynamics. I’ll see if it does better on the big picture:

You

I compromised and ordered 12 televisions this week. However, the supplier informed me that they are experiencing stockouts, and it might take 4 weeks to receive televisions in the future. How should I respond by changing my ordering next week?

ChatGPT

Given the new information about the supplier experiencing stockouts and the potential increase in lead time to 4 weeks, you’ll need to adjust your ordering strategy accordingly to mitigate the risk of running out of stock.

Here’s how you can respond by changing your ordering next week:

1. *Assess Current Inventory:* Evaluate how many televisions you have in stock after accounting for this week’s sales and the 12 televisions you ordered.

2. *Estimate Future Demand:* Consider the expected demand for the next 4 weeks based on historical sales data and any other relevant factors.

3. *Calculate Total Demand:* Add up the expected demand for each of the next 4 weeks.

4. *Account for Lead Time:* Since the lead time has increased to 4 weeks, you’ll need to order enough televisions to cover the expected demand during this extended lead time period.

5. *Consider Safety Stock:* Given the uncertainty with the supplier’s stockouts, you may want to increase your safety stock level to buffer against any unexpected fluctuations in demand or delays in delivery.

6. *Place Order:* Based on your assessment of current inventory, future demand, and the need for safety stock, place an order with the supplier for the necessary quantity of televisions to cover the next 4 weeks.

7. *Monitor and Adjust:* Continuously monitor inventory levels, sales data, and any updates from the supplier regarding lead times. Be prepared to adjust your ordering strategy as needed to maintain optimal inventory levels and minimize the risk of stockouts.

Considering the potential for longer lead times and the need to maintain adequate inventory levels during this period of uncertainty, you may want to order a higher quantity of televisions next week to ensure that you can meet customer demand and avoid stockouts.

Verdict: It’s retreated to a list of considerations, rather than proposing specific order numbers, which is probably good. It does mention “lead time” but it doesn’t quite come out and say “keep track of the product in the order pipeline,” which is a key omission. Also, it’s arbitrarily looking at a 4 week horizon.

You

I’ve been talking to some other retail stores, and it turns out that they’re all ordering extra televisions to build up inventory to offset the longer shipping lead times and supplier stockouts. Should I be concerned about this?

ChatGPT

It’s understandable that other retail stores are adjusting their ordering strategies to build up inventory in response to longer shipping lead times and supplier stockouts. However, whether you should be concerned depends on various factors specific to your own retail store, including your sales patterns, storage capacity, financial situation, and the level of risk you’re comfortable with.

Here are some considerations to help you evaluate whether you should be concerned:

1. *Sales Patterns:* Analyze your historical sales data to understand your typical demand patterns. If your sales are relatively stable and predictable, you may not need to build up as much inventory compared to stores with more fluctuating demand.

2. *Storage Capacity:* Consider whether you have sufficient storage space to accommodate extra inventory. Building up inventory can tie up valuable space and increase holding costs if you’re unable to sell the televisions quickly.

3. *Financial Situation:* Evaluate your cash flow and financial resources. Building up inventory requires additional investment, so ensure that you have the financial capacity to handle the increased inventory levels without negatively impacting your cash flow or liquidity.

4. *Supplier Relationships:* Assess the reliability of your supplier and the likelihood of continued stockouts or longer lead times. If you have concerns about your supplier’s ability to fulfill orders consistently, it may be prudent to build up inventory as a precautionary measure.

5. *Competitive Landscape:* Consider how your competitors’ actions might impact your business. If other retail stores are building up inventory to meet customer demand more effectively, you may need to adjust your strategy to remain competitive.

Ultimately, the decision to build up inventory should be based on a careful assessment of your unique circumstances and risk tolerance. While it’s important to be aware of industry trends and competitive dynamics, your focus should be on optimizing inventory levels to meet customer demand efficiently while minimizing costs and risks to your business. If you’re unsure about the best course of action, consulting with a financial advisor or industry expert may provide valuable insights tailored to your specific situation.

Verdict: The list of considerations 1-5 is not bad, but it’s not really on point to the important question, which is whether the bullwhip effect may come around to bite us.

You

Should I be concerned about the possibility that the current supply shortage will eventually be replaced by a supply glut, when manufacturer capacity catches up with these orders? I don’t want to get stuck with a lot of televisions I can’t sell, because suppliers interpreted strategic phantom ordering for real demand.

ChatGPT

Concerns about the possibility of a supply glut once manufacturer capacity catches up with current orders are valid, especially if there’s been a surge in orders due to strategic stockpiling by retailers. Being stuck with excess inventory can tie up capital, lead to storage issues, and potentially result in markdowns or losses if the televisions can’t be sold at full price.

To address this concern, consider the following strategies:

1. *Monitor Industry Trends:* Stay informed about industry trends, including updates on manufacturer capacity, production levels, and market demand. Keep track of news and reports from reliable sources to anticipate potential shifts in supply and demand dynamics.

2. *Maintain Flexibility:* Maintain flexibility in your ordering strategy to adapt to changing market conditions. Avoid overcommitting to large orders or long-term contracts that could leave you vulnerable to a sudden downturn in demand or an oversupply situation.

3. *Diversify Product Range:* Diversify your product range to reduce reliance on any single category or supplier. By offering a broader selection of products, you can mitigate the impact of supply fluctuations in any particular category and capitalize on emerging trends.

4. *Strengthen Supplier Relationships:* Cultivate strong relationships with your suppliers to facilitate open communication and collaboration. Work closely with your suppliers to share information about inventory levels, sales forecasts, and market trends, enabling them to better anticipate your needs and adjust production accordingly.

5. *Implement Just-in-Time Inventory:* Consider implementing a just-in-time inventory system to minimize excess inventory and optimize stock levels based on real-time demand signals. This approach can help you reduce carrying costs and mitigate the risk of overstocking.

6. *Monitor Inventory Levels:* Continuously monitor your inventory levels and sales performance to identify any signs of excess inventory or slow-moving products. Take proactive measures, such as offering promotions or discounts, to clear out excess inventory before it becomes a liability.

By staying vigilant, maintaining flexibility, and fostering strong relationships with suppliers, you can mitigate the risks associated with potential supply gluts and position your retail store to adapt effectively to changing market conditions.

Verdict: again this is a good list, and several of these items might be useful strategic advice for the supply glut problem. #6 mentions inventory monitoring, which is good, but there’s nothing about supply line monitoring, which is bad, though improving communication with suppliers (#4) and switching to JIT (#5) might help. Still, I feel it’s wise to heed the statement at the bottom of every page:

ChatGPT can make mistakes. Check important info.

AI for modeling – what (not) to do

Ali Akhavan and Mohammad Jalali have a nice new article in the SDR on the use of AI (LLMs) to complement simulation modeling.

Generative AI and simulation modeling: how should you (not) use large language models like ChatGPT

Ali Akhavan, Mohammad S. Jalali

Abstract

Generative Artificial Intelligence (AI) tools, such as Large Language Models (LLMs) and chatbots like ChatGPT, hold promise for advancing simulation modeling. Despite their growing prominence and associated debates, there remains a gap in comprehending the potential of generative AI in this field and a lack of guidelines for its effective deployment. This article endeavors to bridge these gaps. We discuss the applications of ChatGPT through an example of modeling COVID-19’s impact on economic growth in the United States. However, our guidelines are generic and can be applied to a broader range of generative AI tools. Our work presents a systematic approach for integrating generative AI across the simulation research continuum, from problem articulation to insight derivation and documentation, independent of the specific simulation modeling method. We emphasize while these tools offer enhancements in refining ideas and expediting processes, they should complement rather than replace critical thinking inherent to research.

It’s loaded with useful examples of prompts and responses:

I haven’t really digested this yet, but I’m looking forward to writing about it. In the meantime, I’m very interested to hear your take in the comments.

How Beauty Dies

I’m lucky to live in a beautiful place, with lots of wildlife, open spaces, relative quiet and dark skies, and clean water. Keeping it that way is expensive. The cost is not money; it’s the time it takes to keep people from loving it to death, or simply exploiting it until the essence is lost.

Fifty years ago, some far sighted residents realized that development would eventually ruin the area, so they created conservation zoning designed to limit density and preserve natural resources. For a long time, that structure attracted people who were more interested in beauty than money, and the rules enjoyed strong support.

However, there are some side effects. First, the preservation of beauty in a locale, when everywhere else turns to burbs and billboards raises values. Second, the low density needed to preserve resources creates scarcity, again raising property values. High valuations attract people who come primarily for the money, not for the beauty itself.

Every person who moves in exploits a little or a lot of the remaining resources, with the money people leaning towards extraction of as much as possible. As the remaining beauty degrades, fewer people are attracted for the beauty, and the nature of the place becomes more commercial.

A reinforcing feedback drives the system to a tipping point, when the money people outnumber the beauty people. Then they erode the regulations, and paradise is lost to a development free-for-all.

Defining SD

Open Access Note by Asmeret Naugle, Saeed Langarudi, Timothy Clancy: https://doi.org/10.1002/sdr.1762

Abstract
A clear definition of system dynamics modeling can provide shared understanding and clarify the impact of the field. We introduce a set of characteristics that define quantitative system dynamics, selected to capture core philosophy, describe theoretical and practical principles, and apply to historical work but be flexible enough to remain relevant as the field progresses. The defining characteristics are: (1) models are based on causal feedback structure, (2) accumulations and delays are foundational, (3) models are equation-based, (4) concept of time is continuous, and (5) analysis focuses on feedback dynamics. We discuss the implications of these principles and use them to identify research opportunities in which the system dynamics field can advance. These research opportunities include causality, disaggregation, data science and AI, and contributing to scientific advancement. Progress in these areas has the potential to improve both the science and practice of system dynamics.

I shared some earlier thoughts here, but my refined view is in the SDR now:


Invited Commentaries by Tom Fiddaman, Josephine Kaviti Musango, Markus Schwaninger, Miriam Spano: https://doi.org/10.1002/sdr.1763

More reasons to love emissions pricing

I was flipping through a recent Tech Review, and it seemed like every other article was an unwitting argument for emissions pricing. Two examples:

Job title of the future: carbon accountant

We need carbon engineers who know how to make emissions go away more than we need bean counters to tally them. Are we also going to have nitrogen accountants, and PFAS accountants, and embodied methane in iridium accountants, and … ? That way lies insanity.

The fact is, if carbon had a nontrivial price attached at the wellhead, it would pervade the economy, and we’d already have carbon accountants. They’re called accountants.

More importantly, behind those accountants is an entire infrastructure of payment systems that enforces conservation of money. You can only cheat an accounting system for so long, before the cash runs out. We can’t possibly construct parallel systems providing the same robustness for every externality we’re interested in.

Here’s what we know about lab-grown meat and climate change

Realistically, now matter how hard we try to work out the relative emissions of natural and laboratory cows, the confidence bounds on the answer will remain wide until the technology is used at scale.

We can’t guide that scaling process by assessments that are already out of date when they’re published. Lab meat innovators need a landscape in which carbon is priced into their inputs, so they can make the right choices along the way.

Model quality: draining the swamp for large models

In my high road post a while ago, I advocated “voluntary simplicity” as a way of avoiding a large model with an insurmountable burden of undiscovered rework.

Sometimes this is not a choice, because you’re asked to repurpose a large model for some related question. Maybe you didn’t build it, or maybe you did, but it’s time to pause and reflect before proceeding. It’s critical to determine where on the spectrum above the model lies – between the Vortex of Confusion and Nirvana.

I will assert from experience that a large model was almost always built with some narrow questions in mind, and that it was exercised mainly over parts of its state space relevant to those questions. It’s quite likely that a lot of bad behaviors lurk in other parts of the state space. When you repurpose the model those things are going to bite you.

If you start down the red road, “we’ll just add a little stuff for the new question…” you will find yourself in a world of hurt later. It’s absolutely essential that you first do some rigorous testing to establish what the limitations of the model might be in the new context.

The reason scope is so dangerous is that its effect on your ability to make quality progress is nonlinear. The number of interactions you have to manage, and therefore opportunities for errors, and complexity of corrections, is proportional to scope squared. The speed of your model declines with 1/scope, as does the time you have to pay attention to each variable.

My tentative recipe for success, or at least survival:

1. Start early. Errors beget more errors, so the sooner you discover them, the sooner you can arrest that vicious cycle.

2. Be ruthless. Don’t test to see if the model can answer the new question; test to see if you can break it and get nonsense answers.

3. Use your tools. Pay attention to unit errors and runtime warnings. Write Reality Checks to automate tests. Set ranges on key variables to ensure that they’re within reason.

4. Isolate. Because of the nonlinear interaction problem, it’s hard to interpret tests on a full model. Instead, extract components and test them in isolation. You can do this by copy-pasting, or even easier in Vensim, by using Synthesim Overrides to modify inputs to steps, ramps, etc.

5. Don’t let go. When you find a problem, track it back to its root cause.

6. Document. Keep a lab notebook, or an email stream, or a todo list, so you don’t lose the thought when you have multiple issues to chase.

7. Be extreme. Pick a stock and kick it with a pulse or an override. Take all the people out of the factory, or all the ships out of the fleet. What happens? Does anything go negative? Do decisions remain consistent with goals?

8. Calibrate. Calibration against data can be a useful way to find issues, but it’s much slower than other options, so this is something to pursue late in the process. Also, remember that model-data gaps are equally likely to reveal a problem with the data.

9. Rebuild. If you’re finding a lot of problems, you may be better of starting clean, using the existing model as a conceptual guide, but reconsidering the detailed design of the implementation.

10. Submodel. It’s often hard to fix something inside the full plate of spaghetti. But you may be able to identify a solution in an external submodel, free of distractions, and then transplant it back into the host.

11. Reduce. If you can’t rebuild the full scope within available resources, cut things out. This may not be appetizing to the client, but it’s certainly better than delivering a fragile model that only works if you don’t touch it.

12. If you find you’re in a hole, stop digging. Don’t add any features until you have things under control, because they’ll only exacerbate the problems.

13. Communicate. Let the client, and your team, know what you’re up to, and why quality is more important than their cherished kitchen sink.

Integration & Correlation

Claims by AI chatbots, engineers and Nobel prize winners notwithstanding, absence of correlation does not prove absence of causation, any more than presence of correlation proves presence of causation. Bard outlines several reasons from noise and nonlinearity, but missed a key one: bathtub statistics.

Here’s a really simple example of how this reasoning can go wrong. Consider a system with a stock Y(t) that integrates a flow X(t):

X(t) = -t

Y(t) = ∫X(t)dt

We don’t need to simulate to solve for Y(t) = -1/2*t^2 +C.

Over the interval t=[-1,1] the X and Y time series look like this:

The X-Y relationship is parabolic, with correlation zero:

Zero correlation can’t mean “not causal” because we constructed the system to be causal. Even worse, the sign of the relationship depends on the subset of the interval you examine:


This is not the only puzzling case. Consider instead:

X(t) = 1

Y(t) = ∫X(t)dt = t + C

In this case, X(t) has zero variance. But Corr(X,Y) = Cov(X,Y)/σ(X)σ(Y) which is 0/0. What are we to make of that?

This pathology can also arise from feedback. Consider a thermostat that controls a heater that operates in two states (on or off). If the heater is fast, and the thermostat is sensitive with a narrow temperature band, then σ(temperature) will be near 0, even though the heater is cycling with σ(heater state)>0.

AI Chatbots on Causality

Having recently encountered some major causality train wrecks, I got curious about LLM “understanding” of causality. If AI chatbots are trained on the web corpus, and the web doesn’t “get” causality, there’s no reason to think that AI will make sense either.

TLDR; ChatGPT and Bing utterly fail this test, for reasons that are evident in Google Bard’s surprisingly smart answer.

ChatGPT: FAIL

Bing: FAIL

Google Bard: PASS

Google gets strong marks for mentioning a bunch of reasons to expect that we might not find a correlation, even though x is known to cause y. I’d probably only give it a B+, because it neglected integration and feedback, but it’s a good answer that properly raises lots of doubts about simplistic views of causality.