Big Data Gone Bad

An integrated market model is a hungry beast. It wants data from a variety of areas of a firm’s business, often from a variety of sources. As I said in my previous post, typically these data streams have never been considered together before, and therefore they’re full of contradictions and quality issues. Here’s a real world example, from the pharma business. The details are proprietary, and I’ve stylized the data, but the story is pretty simple.

Suppose you have a product with two different indications. One is short term (for injuries, a 4 month treatment), and one is long term (for a chronic condition, over 24 months). It’s of obvious interest to understand the two markets individually, to enable allocation of resources to distinct marketing efforts for each set of doctors and patients.

Here’s the structure of the market:

New patients are started on therapy. They remain in the stock of Patients for some time, before they drop out of therapy or switch to another drug. Initially, just the short term indication is approved; the long term indication gets approved a year into the simulation:

There are twice as many short term starts, but the long term patients stick around 6 times as long, so ultimately there are a lot more of them:

Notice that this is simple first-order goal seeking behavior. The long term patient population is rising toward an equilibrium of (1000 patients/month)x(24 months persistence)=24,000 patients, over a time scale of 24 months.

Puzzle #1

Suppose the data for the long term patients is doing something different (note that the colors now refer to model and data):

The model is goal-seeking, but the patient population data keeps rising. Bathtub dynamics says that it’s impossible for the step in the inflow of starts to integrate to this pattern when the outflow of dropouts is first order. You’d have to conclude that the model can’t fit the data, without invoking some additional assumptions. For example, the persistence of the long term patients might be increasing as doctors gain experience or the composition of the patient population changes.

But what if I told you that the driving data, new starts, isn’t a “real” measurement? First, new prescriptions aren’t easy to distinguish from refills, and there’s a certain amount of overcounting when patients switch pharmacies or otherwise drop out of the data, then reappear. Second, the short term and long term patients take the same drug, and prescription records don’t say why. So, the data vendor infers the split from dosages, prescriber specialties, and the phase of the moon. The inference happens in an undocumented black box algorithm and there’s no way to establish the ground truth of its performance.

Now, do you trust the algorithm, or doctors who say they know the duration of treatment – but might be missing something too?

Puzzle #2

Even in the presence of algorithmic uncertainty, you’d expect certain dynamic reality checks to pass. Consider the share of long term patients in the market. For new starts, it’s a step function, rising from 0 to 1/3 at launch in month 12:

Again, from the bathtub, we know that the patient population can’t instantly mimic the step in starts. If the system is first order with constant persistence, the long term share of patients should rise gradually to 3/4 (1000*24/(1000*24+2000*4)). If persistence is increasing, per puzzle #1, it might go higher on a longer time scale, but it can’t go faster.

Now, suppose the data does something unexpected:

Here, the patient population share data mimics the share of new starts with a time constant that’s very short compared to the persistence of therapy. This should be dynamically impossible in a simple system. But, as always, you could start invoking time varying inputs or parameters to explain what the data shows. (And remember that the real data is noisy, making it harder to be sure about anything.)

But I think there’s another, simpler explanation. The data vendor could be using the same or similar algorithms to classify new starts and existing patients. It could be wrong about the inflow split, or wrong about the stocks, or both. And, it could be reclassifying existing patients from short to long and back with a time constant much faster than the persistence of therapy permits.

Conclusion

It turns out that, in spite of having lots of data about this system, we don’t actually know much. This is a problem for model calibration, because we don’t know which source to trust. Uncertainty in the calibration propagates into decision making. It’s awkward for people in the firm to revise the stories they’ve used to justify past actions. It ought to be awkward for the data vendor to provide flaky information, but luckily they have a near-monopoly.

But we still have options:

  • Track down the data issues. This is the most attractive idea in principle, but it might be slow and expensive to find someone at the data vendor who knows what’s going on, and even then the answer might be unsatisfactory.
  • Model the data. If some details of the data collection process are known, it’s often possible to reverse engineer the “real” data from flawed measurements.
  • Split the difference. Calibrate as best you can to all available information, including gut feel and known “physics” of the situation, not just the numerical data.
  • Embrace the uncertainty. If no theory fits the data, look for policies that are robust to alternative futures, and convey the irreducible uncertainty of the situation to decision makers.

A real challenge for modelers is that model consumers typically have science tastes on a propaganda budget. People are used to seeing data that looks precise, full of enticing detail, with conclusions that sound plausible, but are little more than superstition. It’s cheap to make nice graphics and long figure-rich Powerpoint decks.

Really sorting out what’s going on in situations like this is hard, but it can have great strategic value. For example, in this case, if persistence is increasing, it’s more critical than ever to win the long term patients. If market shares could differ dramatically from what measurements report, competitive threats and opportunities could go unnoticed. Anyone who can use models to discover the fog of data and see through it will have a real competitive edge.

All data are wrong!

Simple descriptions of the Scientific Method typically run like this:

  • Collect data
  • Look for patterns
  • Form hypotheses
  • Gather more data
  • Weed out the hypotheses that don’t fit the data
  • Whatever survives is the truth

There’s obviously more to it than that, but every popular description I’ve seen leaves out one crucial aspect. Frequently, when the hypothesis doesn’t fit the data, it’s the data that’s wrong. This is not an invitation to cherry pick your data; it’s just recognition of a basic problem, particularly in social and business systems.

Any time you are building an integrated systems model, it’s likely that you will have to rely on data from a variety of sources, with differences in granularity, time horizons, and interpretation. Those data streams have probably never been combined before, and therefore they haven’t been properly vetted. They’re almost certain to have problems. If you’re only looking for problems with your hypothesis, you’re at risk of throwing the good model baby out with the bad data bathwater.

The underlying insight is that data is not really distinct from models; it comes from processes that are full of implicit models. Even “simple” measurements like temperature are really complex and assumption-laden, but at least we can easily calibrate thermometers and agree on the definition and scale of Kelvin. This is not always the case for organizational data.

A winning approach, therefore, is to pursue every lead:

  • Is the model wrong?
    • Does it pass or fail extreme conditions tests, conservation laws, and other reality checks?
    • How exactly does it miss following the data, systematically?
    • What feedbacks might explain the shortcomings?
  • Is the data wrong?
    • Do sources agree?
    • Does it mean what people think it means?
    • Are temporal patterns dynamically plausible?
  • If the model doesn’t fit the data, which is to blame?

When you’re building a systems model, it’s likely that you’re a pioneer in uncharted territory, and therefore you’ll learn something new and valuable either way.

Happy E day

E, a.k.a. Euler’s number or the base of the natural logarithm, is near and dear to dynamic modelers. It’s not just the root of exponential growth and decay; thanks to Euler’s Formula it encompasses oscillation, and therefore all things dynamic.

E is approximately 2.718, and today is 2/7/18, at least to Americans, so this is the biggest e day for a while. (NASA has the next 1,999,996 digits, should you need them.) Unlike π, e has not been contested in any state legislature that I know of.

Polynomials & Interpolating Functions for Decision Rules

Sometimes it’s useful to have a way to express a variable as a flexible function of time, so that you can find the trajectory that maximizes some quantity like profit or fit to data. A caveat: this is not generally the best thing to do. A simple feedback rule will be more robust to rescaling and uncertainty and more informative than a function of time. However, there are times when it’s useful for testing or data approximation to have an open-loop decision rule. The attached models illustrate some options.

If you have access to arrays in Vensim, the simplest is to use the VECTOR LOOKUP function, which reads a subscripted table of values with interpolation. However, that has two limitations: a uniform time axis, and linear interpolation.

If you want a smooth function, a natural option is to pick a polynomial, like

y = a + b*t + c*t^2 + d*t^3 …

However, it can be a little fiddly to interpret the coefficients or get them to produce a desired behavior. The Legendre polynomials provide a basis with nicer scaling, which still recovers the basic linear, quadratic, cubic (etc.) terms when needed. (In terms of my last post, their improved properties make them less sloppy.)

 

You can generalize these to 2 dimensions by taking tensor products of the 1D series. Another option is to pick the first n terms of Pascal’s triangle. These yield essentially the same result, and either way, things get complex fast.

Back to 1D series, what if you want to express the values as a sequence of x-y points, with smooth interpolation, rather than arcane coefficients? One option is the Lagrange interpolating polynomial. It’s simple to implement, and has continuous derivatives, but it’s an N^2 problem and therefore potentially compute-intensive. It might also behave badly outside its interval, or inside due to ringing.

Probably the best choice for a smooth trajectory specified by x-y points (and optionally, the slope at each point) is a cubic spline or Bezier curve.

Polynomials1.mdl – simple smooth functions, Legendre, Lagrange and spline, runs in any version of Vensim

InterpolatingArrays.mdl InterpolatingArrays.vpm – array functions, VECTOR LOOKUP, Lagrange and spline, requires Pro/DSS or the free Reader

Sloppy System Dynamics

This post should be required reading for all modelers. And no, I’m not going to reproach sloppy modeling practices. This is much more interesting than that.

Sloppy models are an idea that formalizes a statement Jay Forrester made long ago, in Industrial Dynamics (13.5):

The third and least important aspect of a model to be considered in judging its validity concerns the values for its parameters (constant coefficients). The system dynamics will be found to be relatively insensitive to many of them. They may be chosen anywhere within a plausible range. The few sensitive parameters will be identified by model tests, and it is not so important to know their past values as it is to control their future values in a system redesign.

This remains true when you’re interested in estimation of parameters from data. At Ventana, we rely on the fact that structure and parameters for which you have no measurements will typically reveal themselves in the dynamics, if they’re dynamically important. (There are always pathological cases, where a nonlinearity makes something irrelevant in the past important in the future, but that’s why we don’t base models solely on formal data.)

Now, the required part.  Continue reading “Sloppy System Dynamics”

Vi Hart on positive feedback driving polarization

Vi Hart’s interesting comments on the dynamics of political polarization, following the release of an innocuous video:

I wonder what made those commenters think we have opposite views; surely it couldn’t just be that I suggest people consider the consequences of their words and actions. My working theory is that other markers have placed me on the opposite side of a cultural divide that they feel exists, and they are in the habit of demonizing the people they’ve put on this side of their imaginary divide with whatever moral outrage sounds irreproachable to them. It’s a rather common tool in the rhetorical toolset, because it’s easy to make the perceived good outweigh the perceived harm if you add fear to the equation.

Many groups have grown their numbers through this feedback loop: have a charismatic leader convince people there’s a big risk that group x will do y, therefore it seems worth the cost of being divisive with those who think that risk is not worth acting on, and that divisiveness cuts out those who think that risk is lower, which then increases the perceived risk, which lowers the cost of being increasingly divisive, and so on.

The above feedback loop works great when the divide cuts off a trust of the institutions of science, or glorifies a distrust of data. It breaks the feedback loop if you act on science’s best knowledge of the risk, which trends towards staying constant, rather than perceived risk, which can easily grow exponentially, especially when someone is stoking your fear and distrust.

If a group believes that there’s too much risk in trusting outsiders about where the real risk and harm are, then, well, of course I’ll get distrustful people afraid that my mathematical views on risk/benefit are in danger of creating a fascist state. The risk/benefit calculation demands it be so.

How to ensure that your survey data is useless for dynamic modeling

I’ve been working with pharma brand tracking data, used to calibrate a part of an integrated model of prescriptions in a disease class. Understanding docs’ perceptions of drugs is pretty important, because it’s the major driver of rx. Drug companies spend a lot of money collecting this data; vendors work hard to collect it by conducting quarterly interviews with doctors in a variety of specialties.

Unfortunately, most of the data is poorly targeted for dynamic modeling. It seems to be collected to track and guide ad messaging, but that leads to turbulence that prevents drawing any long term conclusions from the data. That’s likely to lead to reactive decision making. Here’s how to minimize strategic information content:

  1. Ask a zillion questions. Be sure that interviewees have thorough decision fatigue by the time you get to anything important.
  2. Ask numerical questions that require recall of facts no one can remember (how many patients did you treat with X in the last 3 months?).
  3. Change the questions as often as possible, to ensure that you never revisit the same topic twice. (Consistency is so 2015.)
  4. Don’t document those changes.
  5. Avoid cardinal scales. Use vague nominal categories wherever possible. Don’t waste time documenting those categories.
  6. Keep the sample small, but report results in lots of segments.
  7. Confidence bounds? Bah! Never show weakness.
  8. Archive the data in PowerPoint.

On the other hand, please don’t! A few consistent, well-quantified questions are pure gold if you want to untangle causality that plays out over more than a quarter.

Where are the dynamic project managers?

Project management has been one of the most productive and successful areas of system dynamics. And yet, when I recently looked at project management tools and advice, I couldn’t find a hint of SD dynamic insights into product management. Lists of reasons for project failure almost entirely neglect endogenous explanations.

Nothing about rework, late change orders, design/implementation balance, schedule pressure effects on quality and productivity, overtime burnout and turnover, Brooks’ Law, multiphase resource allocation, firefighting or tipping points.

I think there’s an insight and a puzzle here. The insight is that mismanaged dynamics and misperceptions of feedback aren’t the only way to screw up. There are exogenous and single-cause failure modes, like hiring people with the wrong skill set for a job, building something no one wants, or just failing to keep in touch with your team.

However, I’m pretty sure the dominant cause of execution failure is dynamic. Large projects are like sleeping monsters. They are full of positive feedback loops that, when triggered cause increasing delays and overruns, perhaps explaining the heavy-tailed distribution of massive project failures. So, the puzzle is, how could there be so little mention, and so few tools, for management of the internal causes of project success?

Not coincidentally, this problem is one of the major reasons we built Ventity. We’re currently working on project models that are entirely data driven, so you can switch from building a house to building a power plant just by changing some tables of input. We think this will be the missing link between data-oriented tools that manage projects statically in exquisite detail and dynamic models that realistically describe projects, but have traditionally been hard to build, calibrate and reuse.

Detecting the inconsistency of BS

DARPA put out a request for a BS detector for science. I responded with a strategy for combining the results of multiple models (using Mohammad Jalali’s multivariate meta-analysis with some supporting infrastructure like data archiving) to establish whether new findings are consistent with an existing body of knowledge.

DARPA didn’t bite. I have no idea why, but could speculate from the RFC that they had in mind something more like a big data approach that would use text analysis to evaluate claims. Hopefully not, because a text-only approach will have limited power. Here’s why.

Continue reading “Detecting the inconsistency of BS”