stochastic – MetaSD

Sources of Uncertainty

The confidence bounds I showed in my previous post have some interesting features. The following plots show three sources of the uncertainty in simulated surveillance for Chronic Wasting Disease in deer.

Parameter uncertainty
Sampling error in the measurement process
Driving noise from random interactions in the population

You could add external disturbances like weather to this list, though we don’t simulate it here.

By way of background, this come from a fairly big model that combines the dynamics of the host (deer) with an SIR-like model of disease transmission and progression. There’s quite a bit of disaggregation (regions, ages, sexes). The model is driven by historic harvest and sample sizes, and generates deer population, vegetation, and disease dynamics endogenously. The parameters used here represent a Bayesian posterior, from MCMC with literature priors and a lot of data. The parameter sample from the posterior is a joint distribution that captures both individual parameter variation and covariation (though with only a few exceptions things seem to be relatively independent).

Here’s the effect of parameter uncertainty on the disease trajectory:

Each of the 10,000 runs making up this ensemble is deterministic. It’s surprisingly tight, because it is well-determined by the data.

However, parameter uncertainty is not the only issue. Even if you know the actual state of the disease perfectly, there’s still uncertainty in the reported outcome due to sampling variation. You might stray from the “true” prevalence of the disease because of chance in the selection of which deer are actually tested. Making sampling stochastic broadens the bounds:

That’s still not the whole picture, because deer aren’t really deterministic. They come in integer quanta and they have random interactions. Thus a standard SD formulation like:

births = birth rate * doe population

becomes

births = Poisson( birth rate * doe population )

For stock outflows, like the transition from healthy to infected, the Binomial distribution may be the appropriate choice. This means there’s additional variance around the deterministic course, and the model can explore a wider set of trajectories.

There’s one other interesting feature, particularly evident in this last graph: uncertainty around the mean (i.e. the geometric standard deviation) varies quite a bit. Initially, uncertainty increases with time – as Yogi Berra said, “It’s tough to make predictions, especially about the future.” In the early stages of the disese (2003-2008 say), numbers are small and random events affect the timing of takeoff of the disease, amplified by future positive feedback. A deterministic disease model with reproduction ratio R0>1 can only grow, but in a stochastic model luck can cause the disease to go extinct or bumble around 0 prevalence for a while before emerging into growth. Towards the end of this simulation, the confidence bounds narrow. There are two reasons for this: negative feedback is starting to dominate as the disease approaches saturation prevalence, and at the same time the normalized standard deviation of the sampling errors and randomness in deer dynamics is decreasing as the numbers become larger (essentially with 1/sqrt(n)).

This is not uncommon in real systems. For example, you may be unsure where a chaotic pendulum will be in it’s swing a minute from now. But you can be pretty sure that after an hour or a day it will be hanging idle at dead center. However, this might not remain true when you broaden the boundary of the system to include additional feedbacks or disturbances. In this CWD model, for example, there’s some additional feedback from human behavior (not in the statistical model, but in the full version) that conditions the eventual saturation point, perhaps preventing convergence of uncertainty.

Randomness in System Dynamics

A discrete event simulation guru asked Ventana colleague David Peterson about the representation of randomness in System Dynamics models. In the discrete event world, granular randomness is the whole game, and it often doesn’t make sense to look at something without doing a lot of Monte Carlo experiments, because any single run could be misleading. The reply:

Randomness 1: System Dynamics models often incorporate random components, in two ways:
- Internal: the system itself is stochastic (e.g. parts failures, random variations in sales, Poisson arrivals, etc.
- External: All the usual Monte-Carlo explorations of uncertainty from either internal randomness or via replacing constant-but-unknown parameters with probability distributions as a form of sensitivity analysis.
Randomness 2: There is also a kind of probabilistic flavor to the deterministic simulations in System Dynamics. If one has a stochastic linear differential equation with deterministic coefficients and Gaussian exogenous inputs, it is easy to prove that all the state variables have time-varying Gaussian densities. Further, the time-trajectories of the means of those Gaussian process can be computed immediately by the deterministic linear differential equation which is just the original stochastic equations, with all random inputs replaced by their mean trajectories. In System Dynamics, this concept, rigorous in the linear case, is extended informally to the nonlinear case as an approximation. That is, the deterministic solution of a System Dynamics model is often taken as an approximation of what would be concluded about the mean of a Monte-Carlo exploration. Of course it is only an approximate notion, and it gives no information at all about the variances of the stochastic variables.
Randomness 3: A third kind of randomness in System Dynamics models is also a bit informal: delays, which might be naturally modeled as stochastic, are modeled as deterministic but distributed. For example, if procurement orders are received on average 6 months later, with randomness of an unspecified nature, a typical System Dynamics model would represent the procurement delay as a deterministic subsystem, usually a first- or third-order exponential delay. That is the output of the delay, in response to a pulse input, is a first- or third-order Erlang shape. These exponential delays often do a good job of matching data taken from high-volume stochastic processes.
Randomness 4: The Vensim software includes extended Kalman filtering to jointly process a model and data, to estimate the most likely time trajectories of the mean and variance/covariance of the state variables of the model. Vensim also includes the Schweppe algorithm for using such extended filters to compute maximum-likelihood estimates of parameters and their variances and covariances. The system itself might be completely deterministic, but the state and/or parameters are uncertain trajectories or constants, with the uncertainty coming from a stochastic system, or unspecified model approximations, or measurement errors, or all three.

“Vanilla” SD starts with #2 and #3. That seems weird to people used to the pervasive randomness of discrete event simulation, but has a huge advantage of making it easy to understand what’s going on in the model, because there is no obscuring noise. As soon as things are nonlinear or non-Gaussian enough, or variance matters, you’re into the explicit representation of stochastic processes. But even then, I find it easier to build and debug a model deterministically, and then turn on randomness. We explicitly reserve time for this in most projects, but interestingly, in top-down strategic environments, it’s the demand that lags. Clients are used to point predictions and take a while to get into the Monte Carlo mindset (forget about stochastic processes within simulations). The financial crisis seems to have increased interest in exploring uncertainty though.

Rental car stochastic dynamics

This is a little experimental model that I developed to investigate stochastic allocation of rental cars, in response to a Vensim forum question.

There’s a single fleet of rental cars distributed around 50 cities, connected by a random distance matrix (probably not physically realizable on a 2D manifold, but good enough for test purposes). In each city, customers arrive at random, rent a car if available, and return it locally or in another city. Along the way, the dawdle a bit, so returns are essentially a 2nd order delay of rentals: a combination of transit time and idle time.

The two interesting features here are:

Proper use of Poisson arrivals within each time step, so that car flows are dimensionally consistent and preserve the integer constraint (no fractional cars)
Use of Vensim’s ALLOC_P/MARKETP functions to constrain rentals when car availability is low. The usual approach, setting actual = MIN(desired, available/TIME STEP), doesn’t work because available is subscripted by 50 cities, while desired has 50 x 50 origin-destination pairs. Therefore the constrained allocation could result in fractional cars. The alternative approach is to set up a randomized first-come, first-served queue, so that any shortfall preserves the integer constraint.

The interesting experiment with this model is to lower the fleet until it becomes a constraint (at around 10,000 cars).

Documentation is sparse, but units balance.

Requires an advanced Vensim version (for arrays) or the free Model Reader.

carRental.vpm carRental.vmf

Update, with improved distribution choice and smaller array dimensions for convenience:

carRental2.mdl carRental2.vpm

Pink Noise

In a continuous time dynamic model, representing noise as a random draw at every time step can be problematic. As the time step is decreased, the high frequency power of the noise spectrum increases accordingly, potentially changing the behavior. In the limit of small time steps, the resulting white noise has infinite power, which is not physically realistic.

The solution is to use pink noise, which is essentially white noise filtered to cut off high frequencies. SD models from the bad old days typically employed a pink noise generating structure that employed uniformly distributed white noise, relying on the central limit theorem to yield a normally distributed output. Ed Anderson improved that structure to incorporate a normally distributed input, which works better, especially if the cutoff frequency is close to the inverse of the time step.

Two versions of the model are attached: one for advanced versions of Vensim, which permit implementation as a :MACRO:, for efficient reuse. The other works with Vensim PLE.

PinkNoise2010.mdl PinkNoise2010.vmf PinkNoise2010.vpm

PinkNoise2010-PLE.vmf PinkNoise2010-PLE.vpm

Contributed by Ed Anderson, updated by Tom Fiddaman

Notes (also in the model files):

Description: The pink noise molecule described generates a simple random series with autocorrelation. This is useful in representing time series, like rainfall from day to day, in which today’s value has some correlation with what happened yesterday. This particular formulation will also have properties such as standard deviation and mean that are insensitive both to the time step and the correlation (smoothing) time. Finally, the output as a whole and the difference in values between any two days is guaranteed to be Gaussian (normal) in distribution.

Behavior: Pink noise series will have both a historical and a random component during each period. The relative “trend-to-noise” ratio is controlled by the length of the correlation time. As the correlation time approaches zero, the pink noise output will become more independent of its historical value and more “noisy.” On the other hand, as the correlation time approaches infinity, the pink noise output will approximate a continuous time random walk or Brownian motion. Displayed above are two time series with correlation times of 1 and 8 months. While both series have approximately the same standard deviation, the 1-month correlation time series is less smooth from period to period than the 8-month series, which is characterized by “sustained” swings in a given direction. Note that this behavior will be independent of the time-step. The “pink” in pink noise refers to the power spectrum of the output. A time series in which each period’s observation is independent of the past is characterized by a flat or “white” power spectrum. Smoothing a time series attenuates the higher or “bluer” frequencies of the power spectrum, leaving the lower or “redder” frequencies relatively stronger in the output.

Caveats: This assumes the use of Euler integration with a time step of no more than 1/4 of the correlation time. Very long correlation times should be avoided also as the multiplication in the scaled white noise will become progressively less accurate.

Technical Notes: This particular form of pink noise is superior to that of Britting presented in Richardson and Pugh (1981) because the Gaussian (Normal) distribution of the output does not depend on the Central Limit Theorem. (Dynamo did not have a Gaussian random number generator and hence R&P had to invoke the CLM to get a normal distribution.) Rather, this molecule’s normal output is a result of the observations being a sum of Gaussian draws. Hence, the series over short intervals should better approximate normality than the macro in R&P.

MEAN: This is the desired mean for the pink noise.

STD DEVIATION: This is the desired standard deviation for the pink noise.

CORRELATION TIME: This is the smooth time for the noise, or for the more technically minded this is the inverse of the filter’s cut-off frequency in radians.