This is a technical post, so Julian Assange fans can tune out. I’m actually writing about source code management for Vensim models.
Occasionally procurements try to treat model development like software development. That can end in tragedy, because modeling isn’t the same as coding. However, there are many common attributes, and therefore software tools can be useful for modeling.
One typical challenge in model development is version control. Model development is iterative, and I typically go through fifty or more named model versions in the course of a project. C-ROADS is at v142 of its second life. It takes discipline to keep track of all those model iterations, especially if you’d like to be able to document changes along the way and recover old versions. Having a distributed team adds to the challenge.
The old school way
For modeling projects, I tend to do the following:
- Create a model folder on my local hard drive. This will hold the model, plus any ancillary files needed – data, runs, custom graphs, optimization control files, etc. Give it a name, myModelProj_v1+.
- Name the model with a version number, i.e. myModel_v1.mdl
- Whenever there’s a significant upgrade to the model, increment the version number
- Log changes and todo items in a view of the model or an external text file or spreadsheet
- Maintain similar version numbers for the ancillary .xls, .vdf, .vgd, .voc, etc. files.
- When things get crowded, duplicate the model folder, and rename the original with the highest version number, e.g. myModelProj_v12+. Clean out all but the latest version of each file, and continue.
- Any time a major milestone is reached, archive the model folder by zipping it, to prevent inadvertent changes.
This works pretty well, because you can use Vensim’s Model>Compare command to recover changes between any two model files. To the extent that variable names match, you can use Runs Compare to identify changes in inputs (parameters, data) and behavior.
We’ve adapted this workflow to distributed teams by using Dropbox or Groove to synchronize a shared project folder. However, that requires some discipline, to keep track of who has control of the model.
This folder-based approach is effective, but can get messy when you start branching your model down different experimental paths. It also precludes concurrent work on a model version.
Software engineers solved this problem a long time ago, with source control systems like CVS, subversion and git. In a nutshell, these are client-server systems that add a time dimension to your file hierarchy. The server manages a repository that keeps versions of files over time. The client lets you check files in and out of the repository, merge changes and resolve conflicts with other users. You can easily identify changes and revert to earlier versions. There are much better explanations elsewhere.
For modeling on a Windows desktop, it’s easy to start with TortoiseSVN. TortoiseSVN is a subversion client that’s integrated with Windows Explorer, so you can manage your project with a simple right-click on files. It includes a desktop server, so you don’t need anything else to run it.
- Download & install. While you’re at it, get the manual.
- Create a repository
- Create a project folder on the repository and import your data; if you’re working with an existing model, you’ll probably want to use the “import in place” method.
- Heed the advice about cleaning up beforehand – it’ll save work later.
- Before import, switch binary (.vmf) format models to text (.mdl) – that way you’ll be able to see changes at the equation level. If you’ve been avoiding text format because of the proliferation of bitmap files that get saved when you have embedded graphics, you don’t have to worry about that anymore because you won’t be saving each iteration of the model under a new name.
- You can remove version numbers from files if you like, because they’re no longer needed (the repository manages history).
- The rest is easy (but there’s a lot of it).
One practice you may want to consider is selective tracking of datasets (.vdf files). A small model can generate a lot of output, and storing it all may be a waste of repository space. I’m currently tracking nonvolatile things like data imported to .vdf files from external sources, but not every run ever made. Runs are easily reconstructed from a model if you use some discipline: make experiments replicable by setting them up with changes files (.cin) and/or command scripts (.cmd). The control files for runs are typically much smaller than the resulting output dataset.
If you have a distributed team, there’s one more thing you need: a server that’s accessible from the wider web. You can run your own, or find a host. Once your host repository is populated, checking things in and out is about the same. In fact, if you’re joining an existing project, it’s dead easy: just right click where you want to put your working copy and select Checkout, paste in the repository URL, and grab the folder you want.
I haven’t been working with subversion for a while, so I’m rediscovering a lot of this. I’ll post additional thoughts as they come up. One thing on my list is to check out git as an alternative to svn. Git maintains fully populated local repositories, which may be a big advantage for modeling because it enables fast comparison with old versions. I’m interested to hear your thoughts on model-source control.