Danger – another technical post, mainly relevant to users of advanced Vensim versions.
The title has a double meaning: I’m talking about optimizing the speed of a model, which is most often needed for optimization problems.
Here’s the challenge: you have a model, and you think you understand it. You’d like to calibrate it to data, do some policy optimization to identify good decision rules, and do some Monte Carlo simulation to identify sensitive parameters and robust policies. All of those things take thousands or even millions of model runs. Or, maybe you’d just like to make a slightly sluggish model fast enough to run interactively with Synthesim.
Here’s what you can do:
- Run compiled. Probably your best bet is to use MS Visual c++ 2010 Express, which is free, or an old copy of MSVC 6. Some of the versions in between apparently have problems. You may need to change Vensim’s mdl.bat file to match the specific paths of your software. You might succeed with free tools, like gcc+, but I haven’t tried – I’d be interested to hear of such adventures. Update: You now want Visual Studio with the C/C++ workload. For recent versions of Vensim, it’s actually mdldp64.bat that contains the active compile script. It usually works right out of the box.
- Use the INITIAL() statement as much as you can. In particular, be sure that any data-retrieval functions like GET DATA AT TIME are wrapped in an initial if possible (they’re comparatively slow).
- Consider using data equations for input calculations that are not part of the feedback structure of the model. Data equations get executed once, at the start of a group of optimization/sensitivity/Synthesim simulations, rather than with each iteration. However, don’t use data equations where parameters you plan to change are involved, because then your changes won’t propagate through the results. If you want to save startup time too, move complex data calculations into an offline data model.
- Switch any sparse array summary calculations from SUM, VMIN, VMAX, and PROD functions to VECTOR SELECT or VECTOR ELM MAP. Update: I’m no longer sure this is worth the trouble.
- Update: Calculate vector sums only once. For example, instead of writing share[k] = quantity[k]/SUM(quantity[k!]), calculate the sum separately, so that share[k] = quantity[k]/total quantity and total quantity = SUM(quantity[k!]).
- Get rid of any extraneous structure that’s not related to your payoff or other variables of interest. If you still want the information, move it to a separate model that reads the main model’s .vdf for postprocessing. Update: you can put it in a submodel and load it, or not, as needed.
- Consider whether your payoff is separable. In other words, if your model boils down to payoff = part1(parameter1) + part2(parameter2), you can optimize sequentially for parameter1 and parameter2, since they don’t interact. Vensim’s algorithm is designed for the worst case – parameters that interact, multiple optima, discontinuities, etc. You can automate a sequential set of optimizations with a command script.
- Consider transforming some of your optimization parameters. For example, if you are fitting to data for a stock with a first order outflow, that outflow can be written as stock/tau or stock*delta, where tau and delta are the lifetime and fractional loss rate, respectively. Mathematically, it makes no difference whether you use the tau or delta approach, but in some practical cases it might. For example, if you think delta might be near zero (a long lifetime), you might do better to optimize over delta = [0,1] than tau = [1,1e9].
- If you’re looking for hardware improvements, clock speed matters more than multicores and cache. Update: for optimization, MCMC and sensitivity runs, multiple cores help a lot in v10+.
These options are in the order that I thought of them, which means that they’re very roughly in order of likely improvement per unit effort.
Unfortunately, it’s often the case that all of this will get you a 10x improvement, and you need 1000x. Unless you have a supercomputer or massive parallel grid at your disposal, the only real remedy is to simplify your model. Fortunately, that’s not necessarily a bad thing.
Update: One more thing to try: if you’re doing single simulations with a large model, the bottleneck may be the long disk write of the output .vdf file. In that case, you can use a savelist (.lst) to restrict the number of variables stored to just the output you’re interested in. However, you should occasionally do a run without a savelist, and browse through the model results to be sure that things are OK. Consider writing some Reality Checks to enforce quality control on things you won’t be looking at.
Update 2: Another suggestion, via Hazhir Rahmandad: when using the ALLOC functions (DEMAND AT PRICE, etc.), choose simple demand/supply function shapes, like triangular, rather than complex shapes like exponential. The allocation functions iterate to equilibrium internally, and this can be time consuming. Avoiding the use of FIND ZERO and SIMULTANEOUS where possible is also helpful – often a lookup or polynomial approximation to the solution of simultaneous equations will suffice.