Here’s how SD modeling (and science generally) roughly works. Pick a question about a phenomenon of interest. Observe some stuff. Construct some mechanistic theories, what Barry Richmond called “operational thinking.” Test the theories against the observations, and reject or adapt the ones that don’t fit. Also test the theories for consistency with conservation laws, dimensional consistency, robustness in extreme conditions, and other things you know a priori from physics, adjacent models, or other levels of description. The quality of the model that survives is proportional to how hard you tried.
To verify what you’ve done, you either need to attempt to control the system according to your theory, or failing that, make some out of sample predictions and gather new data to see if they come true. This could mean replicating a randomized controlled trial, or predicting the future outcome of a change and waiting a while, or attempting to measure something not previously observed.
Now, if you think your mechanistic theories aren’t working, you can simply assert that the system is random, subject to acts of God, etc. But that’s a useless, superstitious, non-falsifiable and non-actionable statement, and you can’t beat something with nothing. In reality, any statement to the effect that “stuff happens” implies a set of possibilities that can be characterized as a null model with some implicit distribution.
If you’re doing something like a drug trial, the null hypothesis (no difference between drug and placebo) and statistics for assessing the actual difference are well known. Dynamic modeling situations are typically more complex and less controlled (i.e. observational), but that shouldn’t stop you. Suppose I predict that GDP/capita in the US will grow according to:
dGDP/cap / dt = 2.3% – (top marginal tax rate %)/10 + SIN(2*pi*(Year-2025)/4)
and you argue for a null hypothesis
dGDP/cap / dt = 1.8%/yr
we could then compare outcomes. But first, we’d have to each specify the error bars around our prediction. If I’m confident, I might pick N(0,0.2%) and you argue for “stuff happens” at N(0,1%). If we evaluate next year, and growth is 1.8%, you seemingly win, because you’re right on the money, whereas my prediction (about 2%) is 1 standard deviation off. But not so fast … my prediction was 5x narrower. To decide who actually wins, we could look at the likelihood of each outcome.
My log-likelihood is (up to a constant):
-LN(.1)-(.2/.1)^2/2 = 0.39
Yours is:
-LN(1)-(0/.1)^2 = 0
So actually, my likelihood is bigger – I win. Your forecast happened to be right, but was so diffuse that it should be penalized for admitting a wide range of possible outcomes.
If this seems counterintuitive, consider a shooting contest. I propose to hit a bottle on a distant fencepost with my rifle. You propose to hit the side of a nearby barn with your shotgun. I only nick the bottle, causing it to wobble. You hit the barn. Who is the better shot?