Big Data Gone Bad

An integrated market model is a hungry beast. It wants data from a variety of areas of a firm’s business, often from a variety of sources. As I said in my previous post, typically these data streams have never been considered together before, and therefore they’re full of contradictions and quality issues. Here’s a real world example, from the pharma business. The details are proprietary, and I’ve stylized the data, but the story is pretty simple.

Suppose you have a product with two different indications. One is short term (for injuries, a 4 month treatment), and one is long term (for a chronic condition, over 24 months). It’s of obvious interest to understand the two markets individually, to enable allocation of resources to distinct marketing efforts for each set of doctors and patients.

Here’s the structure of the market:

New patients are started on therapy. They remain in the stock of Patients for some time, before they drop out of therapy or switch to another drug. Initially, just the short term indication is approved; the long term indication gets approved a year into the simulation:

There are twice as many short term starts, but the long term patients stick around 6 times as long, so ultimately there are a lot more of them:

Notice that this is simple first-order goal seeking behavior. The long term patient population is rising toward an equilibrium of (1000 patients/month)x(24 months persistence)=24,000 patients, over a time scale of 24 months.

Puzzle #1

Suppose the data for the long term patients is doing something different (note that the colors now refer to model and data):

The model is goal-seeking, but the patient population data keeps rising. Bathtub dynamics says that it’s impossible for the step in the inflow of starts to integrate to this pattern when the outflow of dropouts is first order. You’d have to conclude that the model can’t fit the data, without invoking some additional assumptions. For example, the persistence of the long term patients might be increasing as doctors gain experience or the composition of the patient population changes.

But what if I told you that the driving data, new starts, isn’t a “real” measurement? First, new prescriptions aren’t easy to distinguish from refills, and there’s a certain amount of overcounting when patients switch pharmacies or otherwise drop out of the data, then reappear. Second, the short term and long term patients take the same drug, and prescription records don’t say why. So, the data vendor infers the split from dosages, prescriber specialties, and the phase of the moon. The inference happens in an undocumented black box algorithm and there’s no way to establish the ground truth of its performance.

Now, do you trust the algorithm, or doctors who say they know the duration of treatment – but might be missing something too?

Puzzle #2

Even in the presence of algorithmic uncertainty, you’d expect certain dynamic reality checks to pass. Consider the share of long term patients in the market. For new starts, it’s a step function, rising from 0 to 1/3 at launch in month 12:

Again, from the bathtub, we know that the patient population can’t instantly mimic the step in starts. If the system is first order with constant persistence, the long term share of patients should rise gradually to 3/4 (1000*24/(1000*24+2000*4)). If persistence is increasing, per puzzle #1, it might go higher on a longer time scale, but it can’t go faster.

Now, suppose the data does something unexpected:

Here, the patient population share data mimics the share of new starts with a time constant that’s very short compared to the persistence of therapy. This should be dynamically impossible in a simple system. But, as always, you could start invoking time varying inputs or parameters to explain what the data shows. (And remember that the real data is noisy, making it harder to be sure about anything.)

But I think there’s another, simpler explanation. The data vendor could be using the same or similar algorithms to classify new starts and existing patients. It could be wrong about the inflow split, or wrong about the stocks, or both. And, it could be reclassifying existing patients from short to long and back with a time constant much faster than the persistence of therapy permits.


It turns out that, in spite of having lots of data about this system, we don’t actually know much. This is a problem for model calibration, because we don’t know which source to trust. Uncertainty in the calibration propagates into decision making. It’s awkward for people in the firm to revise the stories they’ve used to justify past actions. It ought to be awkward for the data vendor to provide flaky information, but luckily they have a near-monopoly.

But we still have options:

  • Track down the data issues. This is the most attractive idea in principle, but it might be slow and expensive to find someone at the data vendor who knows what’s going on, and even then the answer might be unsatisfactory.
  • Model the data. If some details of the data collection process are known, it’s often possible to reverse engineer the “real” data from flawed measurements.
  • Split the difference. Calibrate as best you can to all available information, including gut feel and known “physics” of the situation, not just the numerical data.
  • Embrace the uncertainty. If no theory fits the data, look for policies that are robust to alternative futures, and convey the irreducible uncertainty of the situation to decision makers.

A real challenge for modelers is that model consumers typically have science tastes on a propaganda budget. People are used to seeing data that looks precise, full of enticing detail, with conclusions that sound plausible, but are little more than superstition. It’s cheap to make nice graphics and long figure-rich Powerpoint decks.

Really sorting out what’s going on in situations like this is hard, but it can have great strategic value. For example, in this case, if persistence is increasing, it’s more critical than ever to win the long term patients. If market shares could differ dramatically from what measurements report, competitive threats and opportunities could go unnoticed. Anyone who can use models to discover the fog of data and see through it will have a real competitive edge.

Thyroid Dynamics

Quite a while back, I posted about the dynamics of the thyroid and its interactions with other systems.

That was a conceptual model; this is a mathematical model. This is a Vensim replication of:

Marisa Eisenberg, Mary Samuels, and Joseph J. DiStefano III

Extensions, Validation, and Clinical Applications of a Feedback Control System Simulator of the Hypothalamo-Pituitary-Thyroid Axis

Background:We upgraded our recent feedback control system (FBCS) simulation model of human thyroid hormone (TH) regulation to include explicit representation of hypothalamic and pituitary dynamics, and up-dated TH distribution and elimination (D&E) parameters. This new model greatly expands the range of clinical and basic science scenarios explorable by computer simulation.

Methods: We quantified the model from pharmacokinetic (PK) and physiological human data and validated it comparatively against several independent clinical data sets. We then explored three contemporary clinical issues with the new model: …

… These results highlight how highly nonlinear feedback in the hypothalamic-pituitary-thyroid axis acts to maintain normal hormone levels, even with severely reduced TSH secretion.

Volume 18, Number 10, 2008
DOI: 10.1089=thy.2007.0388

This version is a superset of the authors’ earlier 2006 model, and closely reproduces that with a few parameter changes.

L-T4 Bioequivalence and Hormone Replacement Studies via Feedback Control Simulations

Volume 16, Number 12, 2006

The model is used in:

TSH-Based Protocol, Tablet Instability, and Absorption Effects on L-T4 Bioequivalence

Volume 19, Number 2, 2009
DOI: 10.1089=thy.2008.0148

This works with any Vensim version:

thyroid 2008 d.mdl

thyroid 2008 d.vpm

Prediction, in context

I’m increasingly running into machine learning approaches to prediction in health care. A common application is identification of risks for (expensive) infections or readmission. The basic idea is to treat patients like a function approximation problem.

The hospital compiles a big dataset on patient demographics, health status, exposure to procedures, and infection outcomes. A vendor slurps this up and turns some algorithm loose on the data, seeking the risk factors associated with the infection. It might look like this:

… except that there might be 200 predictors, not six – more than you can handle by eyeballing scatter plots or control charts. Once you have a risk model, you know which patients to target for mitigation, and maybe also which associated factors to pursue further.

However, this is only half the battle. Systems thinkers will recognize this model as a dead buffalo: a laundry list with unidirectional causality. The real situation is rich in feedback, including a lot of things that probably don’t get measured, and therefore don’t end up in the data for consideration by the algorithm. For example:

Infections aren’t just a random event for the patient; they happen for reasons that are larger than the patient. Even worse, there are positive feedbacks that can make prevention of infections, and errors more generally, hard to manage. For example, as the number of patients with infections rises, workload goes up, which creates time pressure and fatigue. That induces shortcuts and errors that create risk for patients, leading to more infections. Infections spread to other patients. Fatigued staff burn out and turn over faster, which dilutes the staff experience that might otherwise mitigate risk. (Experience, like many other dynamics, is not shown above.)

An algorithm that predicts risk in this context is certainly useful, because anything that reduces risk helps to diminish the gain of the vicious cycles. But it’s no longer so clear what to do with the patient assessments. Time spent on staff education and action for risk mitigation has to come from somewhere, and therefore might have unintended consequences that aren’t assessed by the algorithm. The algorithm is actually blind in two ways: it can’t respond to any input (like staff fatigue or skill) that isn’t in the data, and it probably  isn’t statistically smart enough to deal with the separation of cause and effect in time and space that arises in a feedback system.

Deep learning systems like Alpha Go Zero might learn to deal with dynamics. But so far, high performance requires very large numbers of exemplars for reinforcement learning, and that’s never going to happen in a community hospital dataset. Then again, we humans aren’t too good at managing dynamic complexity either. But until the machines take over, we can build dynamic models to sort these problems out. By taking an endogenous point of view, we can put machine learning in context, refine our understanding of leverage points, and redesign systems for greater performance.

Exponential Epi Pens

Mylan Pharmaceuticals is in the news for taking the price of EpiPens, which contain about $1 of active ingredient, to stratospheric levels. I think Bloomberg broke the story, and the NY Times has the latest.

Here’s the price trajectory:


epi pen data.xlsx

The rate of increase is not that far from the health care inflation rate in general, except that in this case, there’s no obvious underlying cost driver, hence the allegations of gouging.

Here’s a first cut at the structure of the problem:


Econ 101 says that high profits should attract competition, putting downward pressure on prices (loop B 101). However, that’s not happening, because the FDA is the gatekeeper on product approval. It’s not clear to me whether the FDA just makes the approval delay systematically long and uncertain, or that it’s actually captive to Mylan lobbyists and holding new entrants to higher standards, as some hint (that would be a reinforcing loop, R2). Either way, the only loop that’s functioning is Mylan’s reinvestment in marketing and lobbying to create demand (R 1).

This reminds me of California’s electricity market deregulation debacle, which created a wholesale power market without corresponding retail price elasticity. Utilities were stranded between hammer (floating generation prices) and anvil (fixed demand). The resulting mess was worse than might have occurred in either a more or less deregulated market.

Similarly, to bring this market under control, you’d either have to get the FDA out of the way, restoring the balancing loop, or regulate the price side of the market, constraining the reinforcing loop. In this case, it may be the court of public opinion that puts the brakes on, adding a balancing loop of bad press that has so far cost Mylan dearly in investor confidence, if nothing else.

Mylan responds to gouging allegations rather unconvincingly, I think. Their CEO argues that the problem is multiple markups in the supply chain, subsidization of Europe, and R&D. It’s hard to square those external-cause arguments with Mylan’s financials.

Blood pressure regulation

The Tech Review Arxiv blog has a neat summary of new research on high blood pressure. It turns out that the culprit may be a feedback mechanism that can’t adequately respond to stiffening of the arteries with age:

The human body has a well understood mechanism for monitoring blood pressure changes, consisting of sensors embedded in the major arterial walls that monitor changes in pressure and then trigger other changes in the body to increase or reduce the pressure as necessary, such as the regulation of the volume of fluid in the blood vessels. This is known as the baroreceptor reflex.

So an interesting question is why this system does not respond appropriately as the body ages. Why, for example, does this system not reduce the volume of fluid in the blood to decrease the pressure when it senses a high systolic pressure in an elderly person?

The theory that Pettersen and co have tested is that the sensors in the arterial walls do not directly measure pressure but instead measure strain, that is the deformation of the arterial walls.

As these walls stiffen due to the natural ageing process, the sensors become less able to monitors changes in pressure and therefore less able to compensate.

Spot the health care smokescreen

A Tea Party presentation on health care making the rounds in Montana claims that life expectancy is a smoke screen, and it’s death rates we should be looking at. The implication is that we shouldn’t envy Japan’s longer life expectancy, because the US has lower death rates, indicating superior performance of our health care system.

Which metric really makes the most sense from a systems perspective?

Here’s a simple, 2nd order model of life and death:

From the structure, you can immediately observe something important: life expectancy is a function only of parameters, while the death rate also includes the system states. In other words, life expectancy reflects the expected life trajectory of a person, given structure and parameters, while the aggregate death rate weights parameters (cohort death rates) by the system state (the distribution of population between old and young).

In the long run, the two metrics tell you the same thing, because the system comes into equilibrium such that the death rate is the inverse of the life expectancy. But people live a long time, so it might take decades or even centuries to achieve that equilibrium. In the meantime, the death rate can take on any value between the death rates of the young and old cohorts, which is not really helpful for understanding what a new person can expect out of life.

So, to the extent that health care performance is visible in the system trajectory at all, and not confounded by lifestyle choices, life expectancy is the metric that tells you about performance, and the aggregate death rate is the smokescreen.

Here’s the model: LifeExpectancyDeathRate.mdl or LifeExpectancyDeathRate.vpm

It’s initialized in equilibrium. You can explore disequilbrium situations by varying the initial population distribution (Init Young People & Init Old People), or testing step changes in the death rates.

False positives, publication bias and systems models

A PLOS Medicine paper asserts that most published results are false.

It can be proven that most claimed research findings are false

Corollary 1: The smaller the studies conducted in a scientific field, the less likely the research findings are to be true.

Corollary 2: The smaller the effect sizes in a scientific field, the less likely the research findings are to be true.

Corollary 3: The greater the number and the lesser the selection of tested relationships in a scientific field, the less likely the research findings are to be true.

Corollary 4: The greater the flexibility in designs, definitions, outcomes, and analytical modes in a scientific field, the less likely the research findings are to be true.

Corollary 5: The greater the financial and other interests and prejudices in a scientific field, the less likely the research findings are to be true.

Corollary 6: The hotter a scientific field (with more scientific teams involved), the less likely the research findings are to be true.

This somewhat alarming result arises from fairly simple statistics of false positives, publication selection bias, and causation vs. correlation problems. While the math is incontrovertible, some of the assumptions have been challenged:

… calculating the unreliability of the medical research literature, in whole or in part, requires more empirical evidence and different inferential models than were used. The claim that “most research findings are false for most research designs and for most fields” must be considered as yet unproven.

Still, the argument seems to be a matter of how much rather than whether publication bias influences findings:

We agree with the paper’s conclusions and recommendations that many medical research findings are less definitive than readers suspect, that P-values are widely misinterpreted, that bias of various forms is widespread, that multiple approaches are needed to prevent the literature from being systematically biased and the need for more data on the prevalence of false claims.

(Others propose similar challenges. There’s conflicting literature about whether (weak) observational studies hold up with (strong) randomized follow-up trials.)

This is obviously a big problem from a control perspective, because the kind of information provided by the studies in question is key to managing many systems, as in Nancy Leveson‘s pharma safety example:

It’s also leads me to a rather pointed self-question. To what extent is typical system dynamics modeling practice subject to the same kinds of biases? Can we say not only that all models are wrong, but that most are useless?

First the good news.

  • SD doesn’t usually operate in the data mining space, where large observational studies seek effects absent any a priori causal theory. That means we’re not operating where false positives are most likely to arise.
  • Often, SD practitioners are not testing our own pet theories, but those of some decision makers – perhaps even theories of competing interests in an organization.
  • SD models play a “knowledge integration” role that’s somewhat analogous to meta-analysis. A meta-analysis pools the statistics from a number of replications of some observation, which improves the signal to noise ratio, making it easier to see whether there’s any baby in the bathwater. An SD model instead pools the effect sizes of inputs (studies or anecdotes) and puts them to a functional test: do the individual components, assembled into a system, yield the observed behavior of the macro system?
  • Similarly, good SD modelers tend to supplement purely statistical inputs with Reality Checks that effectively provide additional data verification by testing extreme conditions where outcomes are known (though this is not helpful if you don’t know anything about relationships to begin with).
  • Including physics (using the term loosely to include things like conservation of people) in models also greatly constrains the space of plausible hypotheses a priori.

Now the bad news.

  • Models are often used in one-off, non-replicable strategic decision making situations, so we’ll never know. Refereed forecasting helps, but success can still be due to luck rather than skill.
  • We often have to formalize soft variable concepts for which definitions are uncertain and measurements are lacking.
  • SD models are often reliant on thin literature bases, small studies, or subject matter expertise to establish relationships. Studies with randomized control are a rarity.
  • Available data for model verification is often of low quality and short duration.
  • Data can provide a weak check on the model – if a system exhibits exponential growth, for example, one positive feedback loop in the dynamic hypothesis is as good as another (though of course good a priori explanations of the structure of the system help).

My suspicion is that savvy modelers are already well aware of just how messy and uncertain their problem domains are. Decisions will be taken, with or without a model, so the real objective is to use the model to add value by rejecting ideas that don’t work. The problem then is not that wrong models make decisions worse, but that we could probably do a lot better if we could be smarter about the possible biases in models and thinking in general.

Alex Tabarrok at Marginal Revolution has a nice take on remedies:

What can be done about these problems? (Some cribbed straight from Ioannidis and some my own suggestions.)

1) In evaluating any study try to take into account the amount of background noise. That is, remember that the more hypotheses which are tested and the less selection which goes into choosing hypotheses the more likely it is that you are looking at noise.

2) Bigger samples are better. (But note that even big samples won’t help to solve the problems of observational studies which is a whole other problem).

3) Small effects are to be distrusted.

4) Multiple sources and types of evidence are desirable.

5) Evaluate literatures not individual papers.

6) Trust empirical papers which test other people’s theories more than empirical papers which test the author’s theory.

7) As an editor or referee, don’t reject papers that fail to reject the null.

For SD modeling, I’d add a few more:

8) Reserve time for exploration of uncertainty (lots of Monte Carlo simulation).

9) Calibrate your confidence bounds.

10) Help clients to appreciate the extent and implications of uncertainty.

11) Pay attention to the language used to describe statistical concepts. Words like “expectation” and “significance” that have specific mathematical interpretations don’t mean the same thing to managers.

11) Look for robust policies that work irrespective of uncertain relationships.

12) Explicitly seek out and test alternative hypotheses (This sounds like it’s at odds with Corollary 3 above, but I think it’s the right thing to do. Testing multiple hypotheses in the context of the model is not the same thing as mining data for multiple relationships.).

13) If you can’t estimate something directly from data, or back it up with literature (more than a single paper), at least articulate some bounds on the effect, perhaps through experiments with a submodel.

What do you think? When is modeling and statistical analysis helpful, and when is it risky business?



Fat taxes & modeling

NPR covers a Danish move to tax saturated fat:

So when the tiny Scandinavian country announced it would be imposing a 16 Kroner (about $3 U.S.) tax on every kilogram of saturated fat as a way to discourage poor eating habits and raise revenue, we were left scratching our heads.

How’s that going to work?

Ole Linnet Juul, food director at Denmark’s Confederation of Industries, tells The Washington Post that the tax will increase the price of a burger by around $0.15 and raise the price of a small package of butter by around $0.40.

Our pals over at Planet Money took a stab last year at explaining the economics of our version of the fat tax — the soda tax. They conclude that price increases do drive down demand somewhat.

But couldn’t Danes just easily sneak over to neighboring Sweden for butter and oil and simply avoid paying the tax, throwing all revenue calculations off?

Meanwhile, some health studies indicate a soda tax doesn’t work to curb obesity anyways.

First a few obvious problems: oil is typically not saturated and therefore presumably wouldn’t fall under the tax. And sneaking over the border for butter? Seriously? You’d better bring back a heckuva lot, because there’s the little matter of the Øresund Strait, which now has a handy bridge, and a 36 EUR toll to go with it.

More interesting is the use of models in the linked studies. From the second (“doesn’t work”):

But new research from Northwestern University suggests that soda taxes don’t actually help obese people lose weight, largely because people with weight problems already tend to drink diet soda rather than the sugary kind. So taxing full-calorie sodas may not help many Americans make better dietary choices.

Patel ran computer simulations designed to track how soda prices would affect obesity rates. The findings demonstrated that a sugar tax would cause a negligible drop in obesity, about 1.4%, and that obese people would not lose much weight. “For people going from [body mass indexes] of over 30 to below that…most people are not having massive swings,” Patel said.

For the study, Patel’s team collected data on people with “all ranges of BMI” from the Centers for Disease Control and Prevention’s Behavioral Risk Factor Surveillance System, which has tracked health conditions in the U.S. for nearly three decades. They also collected a data set of soda prices and sales to estimate consumer practices, which they used to predict what people would purchase before and after the implementation of a soda tax. Based on the resulting change in total calories consumed per day over a set time period, the team modeled long-term changes in weight using existing nutrition literature.

Kelly Brownell, the director of the Rudd Center for Food Policy and Obesity at Yale University, has doubts about the accuracy of studies such as Patel’s. Simulations of the potential impact of public health actions such as a soda tax are based on a huge number of assumptions — about consumption, spending behavior, weight change — that are, in reality, difficult to make accurately, he explains.

“All of those changes are unknown,” he said. “So it’s not hard to allow those assumptions to create the results you want.”

Patel counters that assumptions are inevitable in research, and that previous studies that have produced results in favor of soda taxes have also made assumptions, typically about consumer preferences. “I’m trying to see if there are any critical assumptions here that really change the results, but so far I haven’t had anything like that,” he said. “It’s a somewhat valid criticism, but the paper is still being fleshed out, and there are a variety of robustness checks.”

But Patel acknowledges that his study could not predict whether a soda tax would help prevent people from consuming sweetened drinks in the first place and becoming fat later on — another point raised by Brownell. “The question of whether a soda tax could prevent people from becoming obese in the future…that’s still kind of an open question because there are some issues on how you model weight change that to my knowledge haven’t been addressed,” he said. “It’s possible that a soda tax could prevent people from becoming obese in the future, but for people already obese it’s not really going to do anything.”

As press coverage of models goes, this is actually pretty good, and Patel is nicely circumspect about the limitations of the work. The last paragraph hints at one thing that strikes me as extremely important though: the study model is essentially open loop, with price->choice->calories->body mass causality. The real world is closed loop, with important feedbacks between health and future choices of diet and exercise, and social interactions involved in choices. I suspect that the net result is that the long term effect of pricing, or any other measure, on health is substantially greater than the open loop analysis indicates, especially if you’re clever about exploiting the various positive loops that create obesity traps.

Brownell’s complaint – that we know nothing, so we can just plug in assumptions to get whatever answer we want – irks me. It betrays an ignorance of models (especially nonlinear dynamic ones), which are typically more constrained than unstated mental models, not less.

There seems to be a flowering of health and obesity models in system dynamics lately, with some interesting posters and papers at the last few conferences. There’s hope for closing those loops yet.

More power of personal feedback

Now that I’ve dumped on emerging behavioral feedback technologies, perhaps I should share a personal success story, in which measurement technology played a key role.

Ten years ago, a routine test revealed that my cholesterol was 280 mg/dl, and even higher in a confirmation test. That’s not instant death, but it’s bad. NIH calls <200 desirable, and many argue for even lower levels.

This was a surprise, because I was getting a fair amount of exercise and eating healthier than the typical American diet. I suspect that their must be some genetic component.

Without any discussion, my doctor handed me a prescription for Lipitor. Now, I liked that doctor, and I know he was smart because we’d just had an interesting conversation about wavelet analysis of time series data in biomedical research. But I think he was operating under the assumption that there was no potential for improvement from behavior change. This idea seems to grip much of the medical profession, and creates nasty self-fulfilling prophecy and eroding goals dynamics.

I decided that I didn’t want to take statins for the rest of my (hopefully long) life, so with the aid of spousal prodding and planning, I eliminated all cholesterol and saturated fats (essentially all animal products) from my diet. I was quickly below 200, and then made more gradual progress to a range of about 160 to 180.

Interestingly, since then I’ve also cut out a lot of carbohydrates, because the rest of my family is gluten intolerant, which takes the fun out of bread and pasta. My cholesterol is now lower than ever, 149 at last check, in spite of adding eggs, a big dietary cholesterol source, back into my diet.

While my wife deserves most of the credit for my success, I think technology played a key role as well. Early on, I bought a home cholesterol test meter (a Bioscanner 2000, predecessor to the CardioChek that I now have). The meter allowed me to close the loop between behavior and outcome without the long delay and expense involved with a trip to the doctor. That obviously had a practical benefit, but it was also very motivating.

Continue reading “More power of personal feedback”

The Ryan health care proposal

The Ryan budget proposal achieves the bulk of its savings by cutting health care outlays, particularly Medicare and Medicaid. The mechanism sounds a lot like a firm’s transition from a defined benefits pension plan to a defined contribution scheme. Medicaid becomes a system of block grants to states, and Medicare becomes a system of flat-rate vouchers. Along the way, it has some useful aspirations: to separate health insurance from employment and eliminate health’s favored tax status.

Reading some of the finer print, though, I don’t think it really fixes the fundamental flaws of the current system. It’s billed as “universal access” but that’s a misnomer. It guarantees universal access to a tax credit or voucher that can be used to purchase coverage, but not universal access to coverage. That’s because it doesn’t solve the adverse selection problem. As a result, any provider that doesn’t play the usual game of excluding anyone who’s really sick from coverage (using preexisting conditions and rotating plan changes) will suffer a variant of the utility death spiral: increasing costs drive the healthy out of the plan, leaving it to serve a diminishing set of members who had the misfortune to get sick, at an escalating cost.

Universal access to coverage is left to the states, who can create assigned risk pools or other methods to cover the uncoverable. Leaving things to the states strikes me as a reasonable strategy, because the health system is so complex that evolutionary learning is likely to beat the kind of deliberate design we’ll get out of congress. But it’s not clear to me that the proposal creates any real authority to raise money to support these assigned risk pools; without money, the state mechanisms will be rather perfunctory.

The real challenge seems to me to be to address three features of health:

  • Prevention beats cure by a long shot, in terms of both cost and quality of life. In the current system, patient churn through providers eliminates most of the provider-side incentive to address this. Patients have contributed by abdicating responsibility for their own health, and insurance exacerbates the problem by obscuring the costs of the quadruple bypass that follows from a life of Big Macs.
  • Health care expenditures are extremely skewed over one’s lifetime and within age cohorts. Good behavior can’t mitigate all risk, particularly the risk of getting old. (See below for a peek at the data.)
  • In some circumstances, the health care system is capable of expending an extremely large amount of resources on a person – sometimes for a miraculous outcome, and sometimes for rather marginal end-of-life extension.

What’s needed is a distributed way to share risk (which is why it’s called insurance), while preserving incentives for good behavior and matching total expenditures to resources. That’s a tall order. It’s not clear to me that the Ryan proposal tackles it in any serious way; it just extends the flaws of the current system to Medicare patients.

healthExpendAgeIncomeMEPSPer capita annual medical expenditures from the MEPS panel, by age and income. There’s surprisingly little variation by income, but a lot by age. The bill terminates the agency that collects this data.

healthExpendAgeDecileMEPSHealth expenditures by age and decile of cohort, showing the extreme concentration of expenditures at all ages.

The really fine print, the text of the bill itself, is daunting – 629 pages. This strikes me as simply unmanageable (like the deceased cap and trade legislation). There are simply too many opportunities for unintended consequences, and hidden agendas, in such a multifaceted approach, especially with the opaque analytic support available. Surely this could be tackled in a series of smaller bites – health, revenue, other expenditures. It calls to mind the criticism of the FAA’s repeated failure to redesign the air traffic control system, “you can’t design a system that evolved.” Well, maybe you can, but not with the kind of tools and discourse that now prevail.