Complexity should be the default assumption

Whether or not we can prove that a system experiences trophic cascades and other nonlinear side-effects, we should manage as if it does, because we know that these dynamics are common.

There’s been a long-running debate over whether wolf reintroduction led to a trophic cascade in Yellowstone. There’s a nice summary here:

Do Wolves Change Rivers?

Yesterday, June initiated an in depth discussion on the benefit of wolves in Yellowstone, in the form of trophic cascade with the video: How Wolves Change the River:

This was predicted by some, and has been studied by William Ripple, Robert Beschta Trophic Cascades in Yellowstone: The first fifteen years after wolf reintroduction http://www.cof.orst.edu/leopold/papers/RippleBeschtaYellowstone_BioConserv.pdf

Shannon, Roger, and Mike, voiced caution that the verdict was still out.

I would like to caution that many of the reported “positive” impacts wolves have had on the environment after coming back to Yellowstone remain unproven or are at least controversial. This is still a hotly debated topic in science but in the popular media the idea that wolves can create a Utopian environment all too often appears to be readily accepted. If anyone is interested, I think Dave Mech wrote a very interesting article about this (attached). As he puts it “the wolf is neither a saint nor a sinner except to those who want to make it so”.

Mech: Is Science in Danger of Sanctifying Wolves

Roger added

I see 2 points of caution regarding reports of wolves having “positive” impacts in Yellowstone. One is that understanding cause and effect is always hard, nigh onto impossible, when faced with changes that occur in one place at one time. We know that conditions along rivers and streams have changed in Yellowstone but how much “cause” can be attributed to wolves is impossible to determine.

Perhaps even more important is that evaluations of whether changes are “positive” or “negative” are completely human value judgements and have no basis in science, in this case in the science of ecology.

-Ely Field Naturalists

Of course, in a forum discussion, this becomes:

Wolves changed rivers.

Not they didn’t.

Yes they did.

(iterate ad nauseam)

Prove it.

… with “prove it” roughly understood to mean establishing that river = a + b*wolves, rejecting the null hypothesis that b=0 at some level of statistical significance.

I would submit that this is a poor framing of the problem. Given what we know about nonlinear dynamics in  networks like an ecosystem, it’s almost inconceivable that there would not be trophic cascades. Moreover, it’s well known that simple correlation would not be able to detect such cascades in many cases anyway.

A “no effect” default in other situations seems equally naive. Is it really plausible that a disturbance to a project would not have any knock-on effects? That stressing a person’s endocrine system would not cause a path-dependent response? I don’t think so. Somehow we need ordinary conversations to employ more sophisticated notions about models and evidence in complex systems. I think at least two ideas are useful:

  • The idea that macro behavior emerges from micro structure. The appropriate level of description of an ecosystem, or a project, is not a few time series for key populations, but an operational, physical description of how species reproduce and interact with one another, or how tasks get done.
  • A Bayesian approach to model selection, in which our belief in a particular representation of a system is proportional to the degree to which it explains the evidence, relative to various alternative formulations, not just a naive null hypothesis.

In both cases, it’s important to recognize that the formal, numerical data is not the only data applicable to the system. It’s also crucial to respect conservation laws, units of measure, extreme conditions tests and other Reality Checks that essentially constitute free data points in parts of the parameter space that are otherwise unexplored.

The way we think and talk about these systems guides the way we act. Whether or not we can prove in specific instances that Yellowstone had a trophic cascade, or the Chunnel project had unintended consequences, we need to manage these systems as if they could. Complexity needs to be the default assumption.

A Grizzly-Pine-Nutcracker CLD Rework

I spent a little time working out what Clark’s Causal Calamity might look like as a well-formed causal loop diagram. Here’s an attempt, based on little more than spending a lot of time wandering around the Greater Yellowstone ecosystem:

The basic challenge is that there isn’t a single cycle that encompasses the whole system. Grizzlies, for example, are not involved in the central loop of pine-cone-seedling dispersal and growth (R1). They are to some extent free riders on the system – they raid squirrel middens and eat a lot of nuts, which can’t be good for the squirrels (dashed line, loop B5).

There are also a lot of “nuisance” loops  that are essential for robustness of the real system, but aren’t really central to the basic point about ecosystem interconnectedness. B6 is one example – you get such a negative loop every time you have an outflow from a stock (more stuff in the stock -> faster outflow -> less stuff in the stock). R2 is another – the development of clearings from pines via fire and pests is offset by the destruction of pines via the same process.

I suspect that this CLD is still dramatically underspecified and erroneous, compared to the simplest stock-flow model that could encompass these concepts. It would also make a lousy poster for grocery store consumption.

Happy E day

E, a.k.a. Euler’s number or the base of the natural logarithm, is near and dear to dynamic modelers. It’s not just the root of exponential growth and decay; thanks to Euler’s Formula it encompasses oscillation, and therefore all things dynamic.

E is approximately 2.718, and today is 2/7/18, at least to Americans, so this is the biggest e day for a while. (NASA has the next 1,999,996 digits, should you need them.) Unlike π, e has not been contested in any state legislature that I know of.

Vi Hart on positive feedback driving polarization

Vi Hart’s interesting comments on the dynamics of political polarization, following the release of an innocuous video:

I wonder what made those commenters think we have opposite views; surely it couldn’t just be that I suggest people consider the consequences of their words and actions. My working theory is that other markers have placed me on the opposite side of a cultural divide that they feel exists, and they are in the habit of demonizing the people they’ve put on this side of their imaginary divide with whatever moral outrage sounds irreproachable to them. It’s a rather common tool in the rhetorical toolset, because it’s easy to make the perceived good outweigh the perceived harm if you add fear to the equation.

Many groups have grown their numbers through this feedback loop: have a charismatic leader convince people there’s a big risk that group x will do y, therefore it seems worth the cost of being divisive with those who think that risk is not worth acting on, and that divisiveness cuts out those who think that risk is lower, which then increases the perceived risk, which lowers the cost of being increasingly divisive, and so on.

The above feedback loop works great when the divide cuts off a trust of the institutions of science, or glorifies a distrust of data. It breaks the feedback loop if you act on science’s best knowledge of the risk, which trends towards staying constant, rather than perceived risk, which can easily grow exponentially, especially when someone is stoking your fear and distrust.

If a group believes that there’s too much risk in trusting outsiders about where the real risk and harm are, then, well, of course I’ll get distrustful people afraid that my mathematical views on risk/benefit are in danger of creating a fascist state. The risk/benefit calculation demands it be so.

Reforesting Iceland

The NYT has an interesting article on the difficulties of reforesting Iceland.

This is an example of forest cover tipping points.

Iceland appears to be stuck in a state in which “no trees” is locally stable. So, the system pushes back when you try to reforest, at least until you can cross into another basin of attraction that’s forested.

Interestingly, in the Hirota et al. data above, a stable treeless state is a product of low precipitation. But Iceland is wet. So, deserts are a multidimensional thing.

A Bongard problem

Bongard problems test visual pattern recognition, but there’s no reason to be strict about that. Here’s a slightly nontraditional Bongard problem:

The six on the left conform to a pattern or rule, and your task is to discover it. As an aid, the six boxes on the right do not conform to the same pattern. They might conform to a different pattern, or simply reflect the negation of the rule on the left. It’s possible that more than one rule discriminates between the sets, but the one that I have in mind is not strictly visual (that’s a hint).

If you’re stumped, you might go read this nice article about meta-rationality instead.

I’ll post the solution in a few days. Post your guess in comments (no peeking).

Data Science should be about more than data

There are lots of “top 10 skills” lists for data science and analytics. The ones I’ve seen are all missing something huge.

Here’s an example:

Business Broadway – Top 10 Skills in Data Science

Modeling barely appears here. Almost all the items concern the collection and analysis of data (no surprise there). Just imagine for a moment what it would be like if science consisted purely of observation, with no theorizing.

What are you doing with all those data points and the algorithms that sift through them? At some point, you have to understand whether the relationships that emerge from your data make any sense and answer relevant questions. For that, you need ways of thinking and talking about the structure of the phenomena you’re looking at and the problems you’re trying to solve.

I’d argue that one’s literacy in data science is greatly enhanced by knowledge of mathematical modeling and simulation. That could be system dynamics, control theory, physics, economics, discrete event simulation, agent based modeling, or something similar. The exact discipline probably doesn’t matter, so long as you learn to formalize operational thinking about a problem, and pick up some good habits (like balancing units) along the way.

Snow is Normal in Montana

In this case, I think it’s quite literally Normal a.k.a. Gaussian:

Normally distributed snow

Here’s what I think is happening. On windless days with powder, the snow dribbles off the edge of the roof (just above the center of the hump). Flakes drift down in a random walk. The railing terminates the walk after about four feet, by which time the distribution of flake positions has already reached the Normal you’d expect from the Central Limit Theorem.

Enough of the geek stuff; I think I’ll go ski the field.