Automating the London Whale error

This one little trick might prevent you from blowing up your organization.

I’m setting up a new computer, and Excel’s default autosaving nuked one of my Ventity input spreadsheets. I did a little quick analysis on the data, meant to be volatile, but Excel cheerfully made it permanent and diffused it to my other computers. Luckily Ventity noticed when I ran the model.

In theory, it’s no big deal, because you can undo autosaved changes. Except that it’s a very big deal, because you can’t easily see that changes have been autosaved. Change a few numbers here and there, and pretty soon you’re on your way to the next London Whale trading disaster. Model integrity is nonexistent.

Fortunately, you can change the backwards default behavior easily. All you need to do is uncheck this box:

Do it now!

The Trouble with Spreadsheets

As a prelude to my next look at alternative fuels models, some thoughts on spreadsheets.

Everyone loves to hate spreadsheets, and it’s especially easy to hate Excel 2007 for rearranging the interface: a productivity-killer with no discernible benefit. At the same time, everyone uses them. Magne Myrtveit wonders, Why is the spreadsheet so popular when it is so bad?

Spreadsheets are convenient modeling tools, particularly where substantial data is involved, because numerical inputs and outputs are immediately visible and relationships can be created flexibly. However, flexibility and visibility quickly become problematic when more complex models are involved, because:

  • Structure is invisible and equations, using row-column addresses rather than variable names, are sometimes incomprehensible.
  • Dynamics are difficult to represent; only Euler integration is practical, and propagating dynamic equations over rows and columns is tedious and error-prone.
  • Without matrix subscripting, array operations are hard to identify, because they are implemented through the geography of a worksheet.
  • Arrays with more than two or three dimensions are difficult to work with (row, column, sheet, then what?).
  • Data and model are mixed, so that it is easy to inadvertently modify a parameter and save changes, and then later be unable to easily recover the differences between versions. It’s also easy to break the chain of causality by accidentally replacing an equation with a number.
  • Implementation of scenario and sensitivity analysis requires proliferation of spreadsheets or cumbersome macros and add-in tools.
  • Execution is slow for large models.
  • Adherence to good modeling practices like dimensional consistency is impossible to formally verify

For some of the reasons above, auditing the equations of even a modestly complex spreadsheet is an arduous task. That means spreadsheets hardly ever get audited, which contributes to many of them being lousy. (An add-in tool called Exposé can get you out of that pickle to some extent.)

There are, of course, some benefits: spreadsheets are ubiquitous and many people know how to use them. They have pretty formatting and support a wide variety of data input and output. They support many analysis tools, especially with add-ins.

For my own purposes, I generally restrict spreadsheets to data pre- and post-processing. I do almost everything else in Vensim or a programming language. Even seemingly trivial models are better in Vensim, mainly because it’s easier to avoid unit errors, and more fun to do sensitivity analysis with Synthesim.