Prediction, in context

I’m increasingly running into machine learning approaches to prediction in health care. A common application is identification of risks for (expensive) infections or readmission. The basic idea is to treat patients like a function approximation problem.

The hospital compiles a big dataset on patient demographics, health status, exposure to procedures, and infection outcomes. A vendor slurps this up and turns some algorithm loose on the data, seeking the risk factors associated with the infection. It might look like this:

… except that there might be 200 predictors, not six – more than you can handle by eyeballing scatter plots or control charts. Once you have a risk model, you know which patients to target for mitigation, and maybe also which associated factors to pursue further.

However, this is only half the battle. Systems thinkers will recognize this model as a dead buffalo: a laundry list with unidirectional causality. The real situation is rich in feedback, including a lot of things that probably don’t get measured, and therefore don’t end up in the data for consideration by the algorithm. For example:

Infections aren’t just a random event for the patient; they happen for reasons that are larger than the patient. Even worse, there are positive feedbacks that can make prevention of infections, and errors more generally, hard to manage. For example, as the number of patients with infections rises, workload goes up, which creates time pressure and fatigue. That induces shortcuts and errors that create risk for patients, leading to more infections. Infections spread to other patients. Fatigued staff burn out and turn over faster, which dilutes the staff experience that might otherwise mitigate risk. (Experience, like many other dynamics, is not shown above.)

An algorithm that predicts risk in this context is certainly useful, because anything that reduces risk helps to diminish the gain of the vicious cycles. But it’s no longer so clear what to do with the patient assessments. Time spent on staff education and action for risk mitigation has to come from somewhere, and therefore might have unintended consequences that aren’t assessed by the algorithm. The algorithm is actually blind in two ways: it can’t respond to any input (like staff fatigue or skill) that isn’t in the data, and it probably  isn’t statistically smart enough to deal with the separation of cause and effect in time and space that arises in a feedback system.

Deep learning systems like Alpha Go Zero might learn to deal with dynamics. But so far, high performance requires very large numbers of exemplars for reinforcement learning, and that’s never going to happen in a community hospital dataset. Then again, we humans aren’t too good at managing dynamic complexity either. But until the machines take over, we can build dynamic models to sort these problems out. By taking an endogenous point of view, we can put machine learning in context, refine our understanding of leverage points, and redesign systems for greater performance.

The Ryan health care proposal

The Ryan budget proposal achieves the bulk of its savings by cutting health care outlays, particularly Medicare and Medicaid. The mechanism sounds a lot like a firm’s transition from a defined benefits pension plan to a defined contribution scheme. Medicaid becomes a system of block grants to states, and Medicare becomes a system of flat-rate vouchers. Along the way, it has some useful aspirations: to separate health insurance from employment and eliminate health’s favored tax status.

Reading some of the finer print, though, I don’t think it really fixes the fundamental flaws of the current system. It’s billed as “universal access” but that’s a misnomer. It guarantees universal access to a tax credit or voucher that can be used to purchase coverage, but not universal access to coverage. That’s because it doesn’t solve the adverse selection problem. As a result, any provider that doesn’t play the usual game of excluding anyone who’s really sick from coverage (using preexisting conditions and rotating plan changes) will suffer a variant of the utility death spiral: increasing costs drive the healthy out of the plan, leaving it to serve a diminishing set of members who had the misfortune to get sick, at an escalating cost.

Universal access to coverage is left to the states, who can create assigned risk pools or other methods to cover the uncoverable. Leaving things to the states strikes me as a reasonable strategy, because the health system is so complex that evolutionary learning is likely to beat the kind of deliberate design we’ll get out of congress. But it’s not clear to me that the proposal creates any real authority to raise money to support these assigned risk pools; without money, the state mechanisms will be rather perfunctory.

The real challenge seems to me to be to address three features of health:

  • Prevention beats cure by a long shot, in terms of both cost and quality of life. In the current system, patient churn through providers eliminates most of the provider-side incentive to address this. Patients have contributed by abdicating responsibility for their own health, and insurance exacerbates the problem by obscuring the costs of the quadruple bypass that follows from a life of Big Macs.
  • Health care expenditures are extremely skewed over one’s lifetime and within age cohorts. Good behavior can’t mitigate all risk, particularly the risk of getting old. (See below for a peek at the data.)
  • In some circumstances, the health care system is capable of expending an extremely large amount of resources on a person – sometimes for a miraculous outcome, and sometimes for rather marginal end-of-life extension.

What’s needed is a distributed way to share risk (which is why it’s called insurance), while preserving incentives for good behavior and matching total expenditures to resources. That’s a tall order. It’s not clear to me that the Ryan proposal tackles it in any serious way; it just extends the flaws of the current system to Medicare patients.

healthExpendAgeIncomeMEPSPer capita annual medical expenditures from the MEPS panel, by age and income. There’s surprisingly little variation by income, but a lot by age. The bill terminates the agency that collects this data.

healthExpendAgeDecileMEPSHealth expenditures by age and decile of cohort, showing the extreme concentration of expenditures at all ages.

The really fine print, the text of the bill itself, is daunting – 629 pages. This strikes me as simply unmanageable (like the deceased cap and trade legislation). There are simply too many opportunities for unintended consequences, and hidden agendas, in such a multifaceted approach, especially with the opaque analytic support available. Surely this could be tackled in a series of smaller bites – health, revenue, other expenditures. It calls to mind the criticism of the FAA’s repeated failure to redesign the air traffic control system, “you can’t design a system that evolved.” Well, maybe you can, but not with the kind of tools and discourse that now prevail.

Interactive diagrams – obesity dynamics

Food-nutrition-health-exercise-energy interactions are an amazing nest of positive feedbacks, with many win-win opportunities, but more on that another time.

Instead, I’m hoisting an interesting influence diagram about obesity from the comments. At first glance, it’s just another plate of spaghetti.


But when you follow the link (do it now), there’s an interesting innovation: the diagram is interactive. You can zoom, scroll, and highlight particular sectors and dynamics. There’s some narrative here and here.

It took me a while to decide whether I’d call this a causal loop diagram or not. I think the primary distinction between a CLD and other kinds of mindmaps or process diagrams is the use of variables. On a CLD, each label represents a quantity that can vary, with a definite direction – TV Watching, Stress, Use of Medicines. Items on other kinds of diagrams might represent events or fuzzier constellations of concepts. This diagram doesn’t have link polarities (too bad) or loop polarities (which would be pretty incomprehensible anyway), but many other CLDs also avoid such labels for simplicity.

I think there’s a lot of potential for further exploration of this idea. There’s a lot you could do to relate structure to behavior, or at least to explain the rationale for structure (both shortcomings of the diagram). Each link, for example, could have its tale revealed when clicked, and key loops could be animated individually, with stories told. Drill-down could be extended to provide links between top-level subsystem relationships and more microscopic views.

I think huge diagrams like the one above are always going to be overwhelming to a layperson. Also, it’s hard to make even a small CLD good, so making a big one really accurate is tough. Therefore, I’d rather see advanced CLD presentations used to improve the communication of simpler stories, with a few loops. However, big or small, there might be many common technological benefits from dedicated diagramming software.

John Sterman on solving our biggest problems

The key message is that climate, health, and other big messy problems don’t have purely technical fixes. Therefore Manhattan Project approaches to solving them won’t work. Creating and deploying solutions to these problems requires public involvement and widespread change with distributed leadership. The challenge is to get public understanding of climate to carry the same sense of urgency that drove the civil rights movement. From a series at the IBM Almaden Institute conference.

Ultradian Oscillations of Insulin and Glucose

Citation: Jeppe Sturis, Kenneth S. Polonsky, Erik Mokilde, and Eve van Cauter. Computer Model for Mechanisms Underlying Ultradian Oscillations of Insulin and Glucose. Am. J. Physiol. 260 (Endocrinol. Metab. 23): E801-E809, 1991.

Source: Replicated by Hank Taylor

Units: No Yes!

Format: Vensim

Ultradian Oscillations of Insulin and Glucose (Vensim .vpm)

Update, 10/2017:

Refreshed, with units defined (mathematically the same as before): ultradia2.vpm ultradia2.mdl

Further refined, for initialization in equilibrium (insulin by analytic expression; glucose by parameter). Glucose infusion turned on by default. Graphs added.

ultradia-enhanced-3.mdl ultradia-enhanced-3.vpm

The Health Care Death Spiral

Paul Krugman documents an ongoing health care death spiral in California:

Here’s the story: About 800,000 people in California who buy insurance on the individual market — as opposed to getting it through their employers — are covered by Anthem Blue Cross, a WellPoint subsidiary. These are the people who were recently told to expect dramatic rate increases, in some cases as high as 39 percent.

Why the huge increase? It’s not profiteering, says WellPoint, which claims instead (without using the term) that it’s facing a classic insurance death spiral.

Bear in mind that private health insurance only works if insurers can sell policies to both sick and healthy customers. If too many healthy people decide that they’d rather take their chances and remain uninsured, the risk pool deteriorates, forcing insurers to raise premiums. This, in turn, leads more healthy people to drop coverage, worsening the risk pool even further, and so on.

A death spiral arises when a positive feedback loop runs as a vicious cycle. Another example is Andy Ford’s utility death spiral. The existence of the positive feedback leads to counter-intuitive policy prescriptions: Continue reading “The Health Care Death Spiral”

The Blood-Hungry Spleen

OK, I’ve stolen another title, this time from a favorite kids’ book. This post is really about the thyroid, which is a little less catchy than the spleen.

Your hormones are exciting!
They stir your body up.
They’re made by glands (called endocrine)
and give your body pluck.

Allan Wolf & Greg Clarke, The Blood-Hungry Spleen

A friend has been diagnosed with hypothyroidism, so I did some digging on the workings of the thyroid. A few hours searching citations on PubMed, Medline and google gave me enough material to create this diagram:

Thyroid function and some associated feedbacks

(This is a LARGE image, so click through and zoom in to do it justice.)

The bottom half is the thyroid control system, as it is typically described. The top half strays into the insulin regulation system (borrowed from a classic SD model), body fat regulation, and other areas that seem related. A lot of the causal links above are speculative, and I have little hope of turning the diagram into a running model. Unfortunately, I can’t find anything in the literature that really digs into the dynamics of the system. In fact, I can’t even find the basics – how much stuff is in each stock, and how long does it stay there? There is a core of the system that I hope to get running at some point though:

Thyroid - core regulation and dose titration

(another largish image)

This is the part of the system that’s typically involved in the treatment of hypothyroidism with synthetic hormone replacements. Normally, the body runs a negative feedback loop in which thyroid hormone levels (T3/T4) govern production of TSH, which in turn controls the production of T3 and T4. The problem begins when something (perhaps an autoimmune disease, i.e. Hashimoto’s) diminishes the thyroid’s ability to produce T3 and T4 (reducing the two inflows in the big yellow box at center). Then therapy seeks to replace the natural control loop, by adjusting a dose of synthetic T4 (levothyroxine) until the measured level of TSH (left stock structure) reaches a desired target.

This is a negative feedback loop with fairly long delays, so dosage adjustments are made only at infrequent intervals, in order to allow the system to settle between changes. Otherwise, you’d have the aggressive shower taker problem: water’s to cold, crank up the hot water … ouch, too hot, turn it way down … eek, too cold …. Measurements of T3 and T4 are made, but seldom paid much heed – the TSH level is regarded as the “gold standard.”

This black box approach to control is probably effective for many patients, but it leaves me feeling uneasy about several things. The “normal” range for TSH varies by an order of magnitude; what basis is there for choosing one or the other end of the range as a target? Wouldn’t we expect variation among patients in the appropriate target level? How do we know that TSH levels are a reliable indicator, if they don’t correlate well with T3/T4 levels or symptoms? Are extremely sparse measurements of TSH really robust to variability on various time scales, or is dose titration vulnerable to noise?

One could imagine alternative approaches to control, using direct measurements of T3 and T4, or indirect measurements (symptoms). Those might have the advantage of less delay (fewer confounding states between the goal state and the measured state). But T3/T4 measurements seem to be regarded as unreliable, which might have something to do with the fact that it’s hard to find any information on the scale or dynamics of their reservoirs. Symptoms also take a back seat; one paper even demonstrates fairly convincingly that dosage changes +/- 25% have no effect on symptoms (so why are we doing this again?).

I’d like to have a more systemic understanding of both the internal dynamics of the thyroid regulation system, and its interaction with symptoms, behaviors, and other regulatory systems. Here’s hoping that one of you lurkers (I know you’re out there) can comment with some thoughts or references.

So the spleen doesn’t feel shortchanged, I’ll leave you with another favorite:

I think that I ain’t never seen
A poem ugly as a spleen.
A poem that could make you shiver
Like 3.5 … pounds of liver.
A poem to make you lose your lunch,
Tie your intestines in a bunch.
A poem all gray, wet, and swollen,
Like a stomach or a colon.
Something like your kidney, lung,
Pancreas, bladder, even tongue.
Why you turning green, good buddy?
It’s just human body study.

John Scieszka & Lane Smith, Science Verse

Life Expectancy and Equity

Today ScienceDaily brought the troubling news that, “There was a steady increase in mortality inequality across the US counties between 1983 and 1999, resulting from stagnation or increase in mortality among the worst-off segment of the population.” The full article is PLoS Medicine Vol. 5, No. 4, e66 doi:10.1371/journal.pmed.0050066. ScienceDaily quotes the authors,

Ezzati said, “The finding that 4% of the male population and 19% of the female population experienced either decline or stagnation in mortality is a major public health concern.” Christopher Murray, Director of the Institute for Health Metrics and Evaluation at the University of Washington and co-author of the study, added that “life expectancy decline is something that has traditionally been considered a sign that the health and social systems have failed, as has been the case in parts of Africa and Eastern Europe. The fact that is happening to a large number of Americans should be a sign that the U.S. health system needs serious rethinking.”

I question whether it’s just the health system that requires rethinking. Health is part of a complex system of income and wealth, education, and lifestyle choices:

Health in context

Continue reading “Life Expectancy and Equity”