Subversive modeling

This is a technical post, so Julian Assange fans can tune out. I’m actually writing about source code management for Vensim models.

Occasionally procurements try to treat model development like software development. That can end in tragedy, because modeling isn’t the same as coding. However, there are many common attributes, and therefore software tools can be useful for modeling.

One typical challenge in model development is version control. Model development is iterative, and I typically go through fifty or more named model versions in the course of a project. C-ROADS is at v142 of its second life. It takes discipline to keep track of all those model iterations, especially if you’d like to be able to document changes along the way and recover old versions. Having a distributed team adds to the challenge.

The old school way

For modeling projects, I tend to do the following:

  • Create a model folder on my local hard drive. This will hold the model, plus any ancillary files needed – data, runs, custom graphs, optimization control files, etc. Give it a name, myModelProj_v1+.
  • Name the model with a version number, i.e. myModel_v1.mdl
  • Whenever there’s a significant upgrade to the model, increment the version number
  • Log changes and todo items in a view of the model or an external text file or spreadsheet
  • Maintain similar version numbers for the ancillary .xls, .vdf, .vgd, .voc, etc. files.
  • When things get crowded, duplicate the model folder, and rename the original with the highest version number, e.g. myModelProj_v12+. Clean out all but the latest version of each file, and continue.
  • Any time a major milestone is reached, archive the model folder by zipping it, to prevent inadvertent changes.

This works pretty well, because you can use Vensim’s Model>Compare command to recover changes between any two model files. To the extent that variable names match, you can use Runs Compare to identify changes in inputs (parameters, data) and behavior.

We’ve adapted this workflow to distributed teams by using Dropbox or Groove to synchronize a shared project folder. However, that requires some discipline, to keep track of who has control of the model.

This folder-based approach is effective, but can get messy when you start branching your model down different experimental paths. It also precludes concurrent work on a model version.

Another way

Software engineers solved this problem a long time ago, with source control systems like CVS,  subversion and git. In a nutshell, these are client-server systems that add a time dimension to your file hierarchy. The server manages a repository that keeps versions of files over time. The client lets you check files in and out of the repository, merge changes and resolve conflicts with other users. You can easily identify changes and revert to earlier versions. There are much better explanations elsewhere.

For modeling on a Windows desktop, it’s easy to start with TortoiseSVN. TortoiseSVN is a subversion client that’s integrated with Windows Explorer, so you can manage your project with a simple right-click on files. It includes a desktop server, so you don’t need anything else to run it.

  • Download & install. While you’re at it, get the manual.
  • Create a repository
  • Create a project folder on the repository and import your data; if you’re working with an existing model, you’ll probably want to use the “import in place” method.
    • Heed the advice about cleaning up beforehand – it’ll save work later.
    • Before import, switch binary (.vmf) format models to text (.mdl) – that way you’ll be able to see changes at the equation level. If you’ve been avoiding text format because of the proliferation of bitmap files that get saved when you have embedded graphics, you don’t have to worry about that anymore because you won’t be saving each iteration of the model under a new name.
    • You can remove version numbers from files if you like, because they’re no longer needed (the repository manages history).
  • The rest is easy (but there’s a lot of it).

One practice you may want to consider is selective tracking of datasets (.vdf files). A small model can generate a lot of output, and storing it all may be a waste of repository space. I’m currently tracking nonvolatile things like data imported to .vdf files from external sources, but not every run ever made. Runs are easily reconstructed from a model if you use some discipline: make experiments replicable by setting them up with changes files (.cin) and/or command scripts (.cmd). The control files for runs are typically much smaller than the resulting output dataset.

If you have a distributed team, there’s one more thing you need: a server that’s accessible from the wider web. You can run your own, or find a host. Once your host repository is populated, checking things in and out is about the same. In fact, if you’re joining an existing project, it’s dead easy: just right click where you want to put your working copy and select Checkout, paste in the repository URL, and grab the folder you want.

I haven’t been working with subversion for a while, so I’m rediscovering a lot of this. I’ll post additional thoughts as they come up. One thing on my list is to check out git as an alternative to svn. Git maintains fully populated local repositories, which may be a big advantage for modeling because it enables fast comparison with old versions. I’m interested to hear your thoughts on model-source control.

11 thoughts on “Subversive modeling”

  1. Pingback: Tweets that mention Subversive modeling: This is a technical post, so Julian Assange fans can tune out. I’m actually writing about s... -- Topsy.com
    1. I’d never heard of Bazaar, so I googled it. Sounds interesting – I’m also now intrigued by Mercurial. SVN was fine for desktop use, but I’m finding it a bit annoying on a server – the latency of the umbilical is a problem, and I’d really be stuck if I wanted to recover an old model version while away from wifi. Git is much better in this respect, but more complex (or at least the Win clients aren’t as advanced). So, anything (1) simple and (2) p2p is interesting.

  2. I tried out Git with a Vensim model the other day, but it was not able to merge two modified versions of the same .mdl file. This changes that were merged did not conflict with each other in the model, but the merge just did not work. Is there any version control system that can handle merging of Vensim models?

    I haven’t yet looked into the .mdl file format deeply enough to pin point the problem, but it needs further investigation. In theory there shouldn’t be a problem with merge of Vensim models, right?

    1. Interesting … I assume the merge operation completed, but the result was not a runnable file?

      I haven’t tried it, because I’m working solo, but I’ve seen successful merge operations with subversion, so it should work. I’ll have to experiment a bit.

  3. Actually, the merge did not complete, because of unresolved conflicts, so I ended up with a complete mess of a .mdl file (not runnable).

    I will try git merge once more with more consistency, and documenting my steps and files as I go along. I’ll get back to you.

    If you say subversion should work, I will try the exact same kind of merge with the same files with SVN.

    Looks like a weekend in front of the computer…

  4. Hi again Tom,
    I now reproduced the failing of git’s auto-merge with everything properly documented. Everything can be found inside this archive:
    http://users.tkk.fi/~mcnyman/vensim_and_git/vensim+git.tar.gz

    After looking more closely at the results, I find it hard to believe that subversion would be able to do a merge of two conflicting models. Could you try the same experiment with the same files found in the archive with whatever version control you are using?

    Thanks.

  5. I haven’t really had a chance to dig in, but a few quick observations:

    – This isn’t quite the merge situation I had in mind, because the models aren’t runnable, so a lot differs. It seems like that shouldn’t matter in principle, but in practice it seems that it does, because merge doesn’t recognize equation order, and Vensim may vary it.
    – I don’t think automerge is going to work as a result; any merge is going to take some supervision.
    – The same problems do arise in TortoiseMerge.
    – I think this would be less likely to happen in a long model, organized into groups, where user changes were isolated to separate sections.

  6. Hi Tom,

    I discussed with a Git consultant about this, and he confirmed the fact that, the merge is failing as it also should, and the reason for it is that different changes are made to the same line (e.g. foobar and barfoo equations).
    One possible solution would be the development of a Git mergedriver for Vensim models. However, there doesn’t seem to be much demand for version control of Vensim models..

    I haven’t looked into the differences in filetypes of the new Simantics tool being developed here at VTT in Finland, but I am looking forward to it.
    If you haven’t heard of it, I do recommend checking it out at:
    https://www.simantics.org/

    Thank you Tom for your reflections on this merge issue!

Leave a Reply to Karan Khosla Cancel reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.