Parameter Distributions

Answering my own question, here’s the distribution of all 12,000 constants from a dozen models, harvested from my hard drive. About half are from Ventana, and there are a few classics, like World3. All are policy models – no physics, biology, etc.

ParamDistThe vertical scale is magnitude, ABS(x). Values are sorted on the horizontal axis, so that negative values appear on the left. Incredibly, there were only about 60 negative values in the set. Clearly, unlike linear models where signs fall where they may, there’s a strong user preference for parameters with a positive sense.

Next comes a big block of 0s, which don’t show on the log scale. Most of the 0s are not really interesting parameters; they’re things like switches in subscript mapping, though doubtless some are real.

At the right are the positive values, ranging from about 10^-15 to 10^+15. The extremes are units converters and physical parameters (area of the earth). There are a couple of flat spots in the distribution – 1s (green arrow), probably corresponding with the 0s, though some are surely “interesting”, and years (i.e. things with a value of about 2000, blue arrow).

If you look at just the positive parameters, here’s the density histogram, in log10 magnitude bins:

PositiveParmDistAgain, the two big peaks are the 1s and the 2000s. The 0s would be off the scale by a factor of 2. There’s clearly some asymmetry – more numbers greater than 1 (magnitude 0) than less.

LogPositiveParamDistOne thing that seems clear here is that log-uniform (which would be a flat line on the last two graphs) is a bad guess.

What's the empirical distribution of parameters?

Vensim‘s answer to exploring ill-behaved problem spaces is either to do hill-climbing with random restarts, or MCMC and simulated annealing. Either way, you need to start with some initial distribution of points to search.

It’s helpful if that distribution is somehow efficient at exploring the interesting parts of the space. I think this is closely related to the problem of selecting uninformative priors in Bayesian statistics. There’s lots of research about appropriate uninformative priors for various kinds of parameters. For example,

  • If a parameter represents a probability, one might choose the Jeffreys or Haldane prior.
  • Indifference to units, scale and inversion might suggest the use of a log uniform prior, where nothing else is known about a positive parameter

However, when a user specifies a parameter in Vensim, we don’t even know what it represents. So what’s the appropriate prior for a parameter that might be positive or negative, a probability, a time constant, a scale factor, an initial condition for a physical stock, etc.?

On the other hand, we aren’t quite as ignorant as the pure maximum entropy derivation usually assumes. For example,

  • All numbers have to lie between the largest and smallest float or double, i.e. +/- 3e38 or 2e308.
  • More practically, no one scales their models such that a parameter like 6.5e173 would ever be required. There’s a reason that metric prefixes range from yotta to yocto (10^24 to 10^-24). The only constant I can think of that approaches that range is Avogadro’s number (though there are probably others), and that’s not normally a changeable parameter.
  • For lots of things, one can impose more constraints, given a little more information,
    • A time constant or delay must lie on [TIME STEP,infinity], and the “infinity” of interest is practically limited by the simulation duration.
    • A fractional rate of change similarly must lie on [-1/TIME STEP,1/TIME STEP] for stability
    • Other parameters probably have limits for stability, though it may be hard to discover them except by experiment.
    • A parameter with units of year is probably modern, [1900-2100], unless you’re doing Mayan archaeology or paleoclimate.

At some point, the assumptions become too heroic, and we need to rely on users for some help. But it would still be really interesting to see the distribution of all parameters in real models. (See next …)

In {R^n, n large}, no one can hear you scream.

I haven’t had time to write much lately. I spent several weeks in arcane code purgatory, discovering the fun of macros containing uninitialized thread pointers that only fail in 64 bit environments, and for different reasons on Windows, Mac and Linux. That’s a dark place that I hope never again to visit.

Now I’m working fun things again, but they’re secret, so I can’t discuss details. Instead, I’ll just share a little observation that came up in the process.

Frequently, we do calibration or policy optimization on models with a lot of parameters. “A lot” is actually a pretty small number – like 10 – when you have to do things by brute force. This works more often than we have a right to expect, given the potential combinatorial explosion this entails.

However, I suspect that we (at least I) don’t fully appreciate what’s going on. Here are two provable facts that make sense upon reflection, but weren’t part of my intuition about such problems:

In other words, R^n gets big really fast, and it’s all corners. The saving grace is probably that sensible parameters are frequently distributed on low-dimensional manifolds embedded in high dimensional spaces. But we should probably be more afraid than we typically are.