After getting past the provocative title, Robert Axtell’s presentation on the pitfalls of aggregation proved to be very interesting. The slides are posted here:

A comment on my last post on this summed things up pretty well:

… the presentation really focused on the challenges that aggregation brings to the modeling disciplines. Axtell presents some interesting mathematical constructs that could and should form the basis for conversations, thinking, and research in the SD and other aggregate modeling arenas.

It’s worth a look.

Also, as I linked before, check out Hazhir Rahmandad’s work on agent vs. aggregate models of an infection process. His models and articles with John Sterman are here. His thesis is here.

Hazhir’s work explores two extremes – an aggregate model of infection (which is the analog of typical Bass diffusion models in marketing science) compared to agent based versions of the same process. The key difference is that the aggregate model assumes well-mixed victims, while the agent versions explicitly model contacts across various network topologies. The well-mixed assumption is often unrealistic, because it matters *who* is infected, not just how many. In the real world, the gain of an infection process can vary with the depth of penetration of the social network, and only the agent model can capture this in all circumstances.

However, in modeling there’s often a middle road: an aggregation approach that captures the essence of a granular process at a higher level. That’s fortunate, because otherwise we’d always be building model-maps as big as the territory. I just ran across an interesting example.

A new article in PLoS Computational Biology models obesity as a social process:

Many behavioral phenomena have been found to spread interpersonally through social networks, in a manner similar to infectious diseases. An important difference between social contagion and traditional infectious diseases, however, is that behavioral phenomena can be acquired by non-social mechanisms as well as through social transmission. We introduce a novel theoretical framework for studying these phenomena (the SISa model) by adapting a classic disease model to include the possibility for ‘automatic’ (or ‘spontaneous’) non-social infection. We provide an example of the use of this framework by examining the spread of obesity in the Framingham Heart Study Network. … We find that since the 1970s, the rate of recovery from obesity has remained relatively constant, while the rates of both spontaneous infection and transmission have steadily increased over time. This suggests that the obesity epidemic may be driven by increasing rates of becoming obese, both spontaneously and transmissively, rather than by decreasing rates of losing weight. A key feature of the SISa model is its ability to characterize the relative importance of social transmission by quantitatively comparing rates of spontaneous versus contagious infection. It provides a theoretical framework for studying the interpersonal spread of any state that may also arise spontaneously, such as emotions, behaviors, health states, ideas or diseases with reservoirs.

The very idea of modeling obesity as an infectious social process is interesting in itself. But from a technical standpoint, the interesting innovation is that they capture some of the flavor of a disaggregate representation of the population by introducing an approximation,

We can use a pair-wise approximation to formulate the infectious process on a network structure in terms of differential equations. The fundamental variables are numbers of individuals of each type, and also the pairs of individuals, [XY] (where the edges are not directional). Because [XY] = [YX], and the total individuals and total edges is constant, the system can be reduced to three equations.

I haven’t had a chance to dig into the details, but the strategy is interesting. The original infection model is effectively first order. It has two states (S = susceptible, I = infected) but they’re correlated by the constraint that population = N = S+I. Like the Bass model, it reduces to the logistic model when word of mouth or contagion is the only process (no marketing or spontaneous conversion). The enhanced version augments the two states to three: I, [II] pairs and [SI] pairs. I suspect that the S+I=N constraint still limits the effective order to 2.

The effect of this enhancement is to provide a representation of variation in infection rates that arises from contact network structure:

The result of a network structure is that the number of partnerships between susceptible and infected individuals quickly becomes less than if random, and so . We can compare Eq. 7 to the well mixed result (Eq. 2), and see that the effect of the network is to lower the effective transmission rate by a factor of , and hence lower the prevalence, due to these correlations that build up locally. …

Analyzing the n-regular pair-wise equations allows us to get analytic results and determine how and under what conditions network structure affects the spread of behaviors which are both spontaneously acquired and spread interpersonally. Although simple closed-form solutions do not exist when is non-zero, these equations can easily be integrated or numerically solved to get solutions. These equations ignore heterogeneities in the number of edges for different individuals, which can facilitate spread under some conditions ….

An interesting feature of this approach is that it permits exploitation of both macro data (population in each state) and micro data (observed transitivity of the contact network among subjects). That’s not quite as strong as mimicking the exact properties of the contact network topology in an agent model, but it has an important benefit. It separates properties of the network structure (transitivity) that cause time variation in infection rates from properties of the transmission process itself (the probability of transmission from contact between susceptible and infected). That eliminates at least one confounding issue that accounts for the lousy track record of simple-model forecasts of infection processes.

My guess is that the argument in this case for an agent model rather than the aggregate dynamic structure is a matter of representation rather than agent heterogeneity or network topology. You can’t be a little bit pregnant, but real people aren’t either obese or not. There’s a continuum of body mass index values, and the relationship between the BMI of a subjects contacts and their influence on behavior is probably also continuous. One might discretize the S-I states into a few discrete BMI categories, but that would most likely destroy the transparency and analytical tractability of the aggregate dynamics. At that point, it would make sense to switch to an agent representation permitting explicit representation of individual BMIs. That would also allow one to explore whether co-evolution of the network structure with individual BMIs matters (a “birds of a feather flock together” effect).

Fortunately, when all else fails,

Full stochastic simulations on large networks can be carried out to determine how and when the results differ.