Defense Against the Black Box

Baseline Scenario has a nice account of the role of Excel in the London Whale (aka Voldemort) blowup.

… To summarize: JPMorgan’s Chief Investment Office needed a new value-at-risk (VaR) model for the synthetic credit portfolio (the one that blew up) and assigned a quantitative whiz (“a London-based quantitative expert, mathematician and model developer” who previously worked at a company that built analytical models) to create it. The new model “operated through a series of Excel spreadsheets, which had to be completed manually, by a process of copying and pasting data from one spreadsheet to another.” The internal Model Review Group identified this problem as well as a few others, but approved the model, while saying that it should be automated and another significant flaw should be fixed. After the London Whale trade blew up, the Model Review Group discovered that the model had not been automated and found several other errors. Most spectacularly,

“After subtracting the old rate from the new rate, the spreadsheet divided by their sum instead of their average, as the modeler had intended. This error likely had the effect of muting volatility by a factor of two and of lowering the VaR . . .”

Microsoft Excel is one of the greatest, most powerful, most important software applications of all time. …

As a consequence, Excel is everywhere you look in the business world—especially in areas where people are adding up numbers a lot, like marketing, business development, sales, and, yes, finance. …

But while Excel the program is reasonably robust, the spreadsheets that people create with Excel are incredibly fragile. There is no way to trace where your data come from, there’s no audit trail (so you can overtype numbers and not know it), and there’s no easy way to test spreadsheets, for starters. The biggest problem is that anyone can create Excel spreadsheets—badly. Because it’s so easy to use, the creation of even important spreadsheets is not restricted to people who understand programming and do it in a methodical, well-documented way.

This is why the JPMorgan VaR model is the rule, not the exception: manual data entry, manual copy-and-paste, and formula errors. This is another important reason why you should pause whenever you hear that banks’ quantitative experts are smarter than Einstein, or that sophisticated risk management technology can protect banks from blowing up. …

System Dynamics has a strong tradition of model quality control, dating all the way back to its origins in Industrial Dynamics. Some of it is embodied in software, while other bits are merely habits and traditions. If the London Whale model had been an SD model, would the crucial VaR error have occurred? Since the model might not have employed much feedback, one might also ask, had it been built with SD software, like Vensim, would the error have occurred?

There are multiple lines of defense against model errors:

  • Seeing the numbers. This is Excel’s strong suit. It apparently didn’t help in this case though.
  • Separation of model and data. A model is a structure that one can populate with different sets of parameters and data. In Excel, the structure and the data are intermingled, so it’s tough to avoid accidental replacement of structure (an equation) by data (a number), and tough to compare versions of models or model runs to recover differences. Vensim is pretty good at that. But it’s not clear that such comparisons would have revealed the VaR structure error.
  • Checking units of measure. When I was a TA for the MIT SD course, I graded a LOT of student models. I think units checking would have caught about a third of conceptual errors. In this case though, the sum and average of a variable have the same units, so it wouldn’t have helped.
  • Fit to data. Generally, people rely far too much on R^2, and too little on other quality checks, but the VaR error is exactly the kind of problem that might be revealed by comparison to history. However, if the trade was novel, there might not be any relevant data to use. In any case, there’s no real obstacle to evaluating fit in Excel, though the general difficulties of building time series models are an issue where time is relevant.
  • Conservation laws. SD practitioners are generally encouraged to observe conservation of people, money, material, etc. Software supports this with the graphical stock-flow convention, though it ought to be possible to do more. Excel doesn’t provide any help in this department, though it’s not clear whether it would have mattered to the Whale trade model.
  • Extreme conditions tests. “Kicking the tires” of models has been a good idea since the beginning. This is an ingrained SD habit, and Vensim provides Reality Check™ to automate it. It’s not clear that this would have revealed the VaR sum vs. average error, because that’s a matter of numerical sensitivity that might not reveal itself as a noticeable change in behavior. But I bet it would reveal lots of other problems with the model boundary and limitations to validity of relationships.
  • Abstraction. System Dynamics focuses on variables as containers for time series, and distinguishes stocks (state variables) from flows and other auxiliary conversions. Most SD languages also include some kind of array facility, like subscripts in Vensim, for declarative processing of detail complexity. Excel basically lacks such conventions, except for named ranges that are infrequently used. Time and other dimensions exist spatially as row-column layout. This means that an Excel model is full of a lot of extraneous material for handling dynamics, is stuck in discrete time, can’t be checked for DT stability, and requires a lot of manual row-column fill operations to express temporal phenomena that are trivial in SD and many other languages. With less busywork needed, it might have been much easier for auditors to discover the VaR error.
  • Readable equations. It’s not uncommon to encounter =E1*EXP($D$3)*SUM(B32:K32)^2/(1+COUNT(A32:K32)) in Excel. While it’s possible to create such gobbledygook in Vensim, it’s rare to actually encounter it, because SD software and habits encourage meaningful variable names and “chunking” equations into comprehensible components. Again, this might have made it much easier for auditors to discover the VaR error.
  • Graphical representation of structure. JPMorgan should get some credit for having a model audit process at all, even though it failed to prevent the error. Auditors’ work is much easier when they can see what the heck is going on in the model. SD software provides useful graphical conventions for revealing model structure. Excel has no graphics. There’s an audit tool, but it’s hampered by the lack of a variable concept, and it’s slower to use than Vensim’s Causal Tracing™.

I think the score’s Forrester 8, Gates 1. Excel is great for light data processing and presentation, but it’s way down my list of tools to choose for serious modeling. The secret to its success, cell-level processing that’s easy to learn and adaptable to many problems, is also its Achilles heel. Add in some agency problems and confirmation bias, and it’s a deadly combination:

There’s another factor at work here. What if the error had gone the wrong way, and the model had incorrectly doubled its estimate of volatility? Then VaR would have been higher, the CIO wouldn’t have been allowed to place such large bets, and the quants would have inspected the model to see what was going on. That kind of error would have been caught. Errors that lower VaR, allowing traders to increase their bets, are the ones that slip through the cracks. That one-sided incentive structure means that we should expect VaR to be systematically underestimated—but since we don’t know the frequency or the size of the errors, we have no idea of how much.

Sadly, the loss on this single trade would probably just about pay for all the commercial SD that’s ever been done.


The Trouble with Spreadsheets



Like spreadsheets, open-loop models are popular but flawed tools. An open loop model is essentially a scenario-specification tool. It translates user input into outcomes, without any intervening dynamics. These are common in public discourse. An example turned up in the very first link when I googled “regional growth forecast”:

The growth forecast is completed in two stages. During the first stage SANDAG staff produces a forecast for the entire San Diego region, called the regionwide forecast. This regionwide forecast does not include any land use constraints, but simply projects growth based on existing demographic and economic trends such as fertility rates, mortality rates, domestic migration, international migration, and economic prosperity.

In other words, there’s unidirectional causality from inputs  to outputs, ignoring the possible effects of the outputs (like prosperity) on the inputs (like migration). Sometimes such scenarios are useful as a starting point for thinking about a problem. However, with no estimate of the likelihood of realization of such a scenario, no understanding of the feedback that would determine the outcome, and no guidance about policy levers that could be used to shape the future, such forecasts won’t get you very far (but they might get you pretty deep – in trouble).

The key question for any policy, is “how do you get there from here?” Models can help answer such questions. In California, one key part of the low-carbon fuel standard (LCFS) analysis was VISION-CA. I wondered what was in it, so I took it apart to see. The short answer is that it’s an open-loop model that demonstrates a physically-feasible path to compliance, but leaves the user wondering what combination of vehicle and fuel prices and other incentives would actually get consumers and producers to take that path.

First, it’s laudable that the model is publicly available for critique, and includes macros that permit replication of key results. That puts it ahead of most analyses right away. Unfortunately, it’s a spreadsheet, which makes it tough to know what’s going on inside.

I translated some of the model core to Vensim for clarity. Here’s the structure:


Bringing the structure into the light reveals that it’s basically a causal tree – from vehicle sales, fuel efficiency, fuel shares, and fuel intensity to emissions. There is one pair of minor feedback loops, concerning the aging of the fleet and vehicle losses. So, this is a vehicle accounting tool that can tell you the consequences of a particular pattern of new vehicle and fuel sales. That’s already a lot of useful information. In particular, it enforces some reality on scenarios, because it imposes the fleet turnover constraint, which imposes a delay in implementation from the time it takes for the vehicle capital stock to adjust. No overnight miracles allowed.

What it doesn’t tell you is whether a particular measure, like an LCFS, can achieve the desired fleet and fuel trajectory with plausible prices and other conditions. It also can’t help you to decide whether an LCFS, emissions tax, or performance mandate is the better policy. That’s because there’s no consumer choice linking vehicle and fuel cost and performance, consumer knowledge, supplier portfolios, and technology to fuel and vehicle sales. Since equilibrium analysis suggests that there could be problems for the LCFS, and disequilibrium generally makes things harder rather than easier, those omissions are problematic.

Continue reading “Fuzzy VISION”

The Trouble with Spreadsheets

As a prelude to my next look at alternative fuels models, some thoughts on spreadsheets.

Everyone loves to hate spreadsheets, and it’s especially easy to hate Excel 2007 for rearranging the interface: a productivity-killer with no discernible benefit. At the same time, everyone uses them. Magne Myrtveit wonders, Why is the spreadsheet so popular when it is so bad?

Spreadsheets are convenient modeling tools, particularly where substantial data is involved, because numerical inputs and outputs are immediately visible and relationships can be created flexibly. However, flexibility and visibility quickly become problematic when more complex models are involved, because:

  • Structure is invisible and equations, using row-column addresses rather than variable names, are sometimes incomprehensible.
  • Dynamics are difficult to represent; only Euler integration is practical, and propagating dynamic equations over rows and columns is tedious and error-prone.
  • Without matrix subscripting, array operations are hard to identify, because they are implemented through the geography of a worksheet.
  • Arrays with more than two or three dimensions are difficult to work with (row, column, sheet, then what?).
  • Data and model are mixed, so that it is easy to inadvertently modify a parameter and save changes, and then later be unable to easily recover the differences between versions. It’s also easy to break the chain of causality by accidentally replacing an equation with a number.
  • Implementation of scenario and sensitivity analysis requires proliferation of spreadsheets or cumbersome macros and add-in tools.
  • Execution is slow for large models.
  • Adherence to good modeling practices like dimensional consistency is impossible to formally verify

For some of the reasons above, auditing the equations of even a modestly complex spreadsheet is an arduous task. That means spreadsheets hardly ever get audited, which contributes to many of them being lousy. (An add-in tool called Exposé can get you out of that pickle to some extent.)

There are, of course, some benefits: spreadsheets are ubiquitous and many people know how to use them. They have pretty formatting and support a wide variety of data input and output. They support many analysis tools, especially with add-ins.

For my own purposes, I generally restrict spreadsheets to data pre- and post-processing. I do almost everything else in Vensim or a programming language. Even seemingly trivial models are better in Vensim, mainly because it’s easier to avoid unit errors, and more fun to do sensitivity analysis with Synthesim.