MSU Covid – what will tomorrow bring?

The following is a note I posted to a local listserv earlier in the week. It’s an example of back-of-the-envelope reasoning informed by experience with models, but without actually calibrating a model to verify the results. Often that turns out badly. I’m posting this to archive it for review and discussion later, after new data becomes available (as early as tomorrow, I expect).

I thought about responding to this thread two weeks ago, but at the time numbers were still very low, and data was scarce. However, as an MSU parent, I’ve been watching the reports closely. Now the picture is quite different.

If you haven’t discovered it, Gallatin County publishes MSU stats at the end of the weekly Surveillance Report, found here:

https://www.healthygallatin.org/about-us/press-releases/

For the weeks ending 9/10, 9/17, 9/24, and 10/2, MSU had 3, 7, 66, and 43 new cases. Reported active cases are slightly lower, which indicates that the active case duration is less than a week. That’s inconsistent with the two-week quarantine period normally recommended. It’s hard to see how this could happen, unless quarantine compliance is low or delays cause much of the infectious period to be missed (not good either way).

The huge jump two weeks ago is a concern. That’s growth of 32% per day, faster than the typical uncontrolled increase in the early days of the epidemic. That could happen from a superspreader event, but more likely reflects insufficient testing to detect a latent outbreak.

Unfortunately they still don’t publish the number of tests done at MSU, so it’s hard to interpret any of the data. We know the upper bound, which is the 2000 or so tests per week reported for all of Gallatin county. Even if all of those were dedicated to MSU, it still wouldn’t be enough to put a serious dent in infection through testing, tracing and isolation. Contrast this with Colby College, which tests everyone twice a week, which is a test density about 100x greater than Gallatin County+MSU.

In spite of the uncertainty, I think it’s wrong to pin Gallatin County’s increase in cases on MSU. First, COVID prevalence among incoming students was unlikely to be much higher than in the general population. Second, Gallatin County is much larger than MSU, and students interact largely among themselves, so it would be hard for them to infect the broad population. Third, the county has its own reasons for an increase, like reopening schools. Depending on when you start the clock, MSU cases are 18 to 28% of the county total, which is at worst 50% above per capita parity. Recently, there is one feature of concern – the age structure of cases (bottom of page 3 of the surveillance report). This shows that the current acceleration is driven by the 10-19 and 20-29 age groups.

As a wild guess, reported cases might understate the truth by a factor of 10. That would mean 420 active cases at MSU when you account for undetected asymptomatics and presymptomatic untested contacts. That’s out of a student/faculty population of 20,000, so it’s roughly 2% prevalence. A class of 10 would have a 1/5 chance of a positive student, and for 20 it would be 1/3. But those #s could easily be off by a factor of 2 or more.

Just extrapolating the growth rate (33%/week for cumulative cases), this Friday’s report would be for 61 new cases, 207 cumulative. If you keep going to finals, the cumulative would grow 10x – which basically means everyone gets it at some point, which won’t happen. I don’t know what quarantine capacity is, but suppose that MSU can handle a 300-case week (that’s where things fell apart at UNC). If so, the limit is reached in less than 5 weeks, just short of finals.

I’d say these numbers are discouraging. As a parent, I’m not yet concerned enough to pull my kids out, but they’re nonresidential so their exposure is low. Around classrooms on campus, compliance with masks, sanitizing and distancing is very good – certainly better than it is in town. My primary concern at present is that we don’t know what’s going on, because the published statistics are insufficient to make reliable judgments. Worse, I suspect that no one knows what’s going on, because there simply isn’t enough testing to tell. Tests are pretty cheap now, and the disruption from a surprise outbreak is enormous, so that seems penny wise and pound foolish. The next few weeks will reveal whether we are seeing random variation or the beginning of a large outbreak, but it would be far better to have enough surveillance and data transparency to know now.

Talking with COVID conspiracy theorists

Tech review has a nice article on how to talk to conspiracy theorists.

What are they hiding in the woods?

I think some of the insights here are also applicable to talking about models, which is turning out to be a real challenge in the COVID era, with high rates of belief in conspiracies. My experience in social media settings is very negative. If I mention anything indicating that I might actually know something about the problem, that triggers immediate suspicion – oh, so you work for the government, eh? Somehow non sequiturs and hearsay beat models every time.

h/t Chris Soderquist for an interesting resource:

“Any assertion of expertise from an actual expert, meanwhile, produces an explosion of anger from certain quarters of the American public, who immediately complain that such claims are nothing more than fallacious “appeals to authority,” sure signs of dreadful “elitism,” and an obvious effort to use credentials to stifle the dialogue required by a “real” democracy. Americans now believe that having equal rights in a political system also means that each person’s opinion about anything must be accepted as equal to anyone else’s. This is the credo of a fair number of people despite being obvious nonsense. It is a flat assertion of actual equality that is always illogical, sometimes funny, and often dangerous.”

Notes From: Tom Nichols. “The Death of Expertise.” Apple Books.

I’m finding the Tech Review article’s points 3 & 6 to be most productive: test the waters first, and use the Socratic method (careful questioning to reveal gaps in thinking). But the best advice is really proving to be, don’t look and don’t take on the trolls directly. It’s more productive to help people who are curious and receptive to modeling than to battle people who basically resist everything since the enlightenment.

 

On Problem Statements

Saras Chung’s plenary commentary at ISDC 2020 reminded me of this nice article:

The Most Underrated Skill in Management

There are few management skills more powerful than the discipline of clearly articulating the problem you seek to solve before jumping into action.

BY NELSON P. REPENNING, DON KIEFFER, AND TODD ASTOR

http://problemledleadership.mit.edu/wp-content/uploads/The_Most_Underrated_Skill_in_Management-MIT_Sloan_Management_Review.pdf

A good problem statement has five basic elements:

• It references something the organization cares about and connects that element to a clear and specific goal;

• it contains a clear articulation of the gap between the current state and the goal;

• the key variables — the target, the current state, and the gap — are quantifiable;

• it is as neutral as possible concerning possible diagnoses or solutions; and

• it is sufficiently small in scope that you can tackle it quickly.

Excel, lost in the cloud

Excel is rapidly becoming unusable as Microsoft tries to shift everyone into the OneDrive/Sharepoint cloud. Here’s a very simple equation from a population model:

='https://ventanasystems-my.sharepoint.com/personal/vrbo_onmicrosoft_com/Documents/_Mkt/lxpgi/Model/Model/[Cohort Model Natural Increase.xlsx]Boston'!S135+'https://ventanasystems-my.sharepoint.com/personal/vrbo_onmicrosoft_com/Documents/_Mkt/lxpgi/Model/Model/[Cohort Model Immigration.xlsx]Boston'!S119+('https://ventanasystems-my.sharepoint.com/personal/vrbo_onmicrosoft_com/Documents/_Mkt/lxpgi/Model/Model/[Cohort Model NPR.xlsx]Boston'!S119-'https://ventanasystems-my.sharepoint.com/personal/vrbo_onmicrosoft_com/Documents/_Mkt/lxpgi/Model/Model/[Cohort Model.xlsx]Boston'!R119

URLs as equation terms? What were they thinking? This is an interface choice that makes things easy for programmers, and impossible for users.

Coronavirus Roundup II

Some things I’ve found interesting and useful lately:

R0

What I think is a pretty important article from LANL: High Contagiousness and Rapid Spread of Severe Acute Respiratory Syndrome Coronavirus 2. This tackles the questions I wondered about in my steady state growth post, i.e. that high observed growth rates imply high R0 if duration of infectiousness is long.

Earlier in the epidemic, this was already a known problem:

The time scale of asymptomatic transmission affects estimates of epidemic potential in the COVID-19 outbreak

The reproductive number of COVID-19 is higher compared to SARS coronavirus

Data

Epiforecasts’ time varying R0 estimates

CMMID’s time varying reporting coverage estimates

NECSI’s daily update for the US

The nifty database of US state policies from Raifman et al. at BU

A similar policy tracker for the world

The covidtracking database. Very useful, if you don’t mind a little mysterious turbulence in variable naming.

The Kinsa thermometer US health weather map

Miscellaneous

Nature’s Special report: The simulations driving the world’s response to COVID-19

Pandemics Depress the Economy, Public Health Interventions Do Not: Evidence from the 1918 Flu

Projecting the transmission dynamics of SARS-CoV-2 through the postpandemic period has some interesting dynamics, including seasonality.

Quantifying SARS-CoV-2 transmission suggests epidemic control with digital contact tracing looks at requirements for contact tracing and isolation

Models for Count Data With Overdispersion has important considerations for calibration

Variolation: hmm. Filed under “interesting but possibly crazy.”

Creative, and less obviously crazy: An alternating lock-down strategy for sustainable mitigation of COVID-19

How useful are antibody tests?

I just ran across this meta-analysis of antibody test performance on medrxiv:

Antibody tests in detecting SARS-CoV-2 infection: a meta-analysis

In total, we identified 38 eligible studies that include data from 7,848 individuals. The analyses showed that tests using the S antigen are more sensitive than N antigen-based tests. IgG tests perform better compared to IgM ones, andshow better sensitivity when the samples were taken longer after the onset of symptoms. Moreover, irrespective of the method, a combined IgG/IgM test seems to be a better choice in terms of sensitivity than measuring either antibody type alone. All methods yielded high specificity with some of them (ELISA and LFIA) reaching levels around 99%. ELISA-and CLIA-based methods performed better in terms of sensitivity (90-94%) followed by LFIA and FIA with sensitivities ranging from 80% to 86%.

The sensitivity results are interesting, but I’m more interested in timing:

Sample quality, low antibody concentrations and especially timing of the test -too soon after a person is infected when antibodies have not been developed yet or toolate when IgM antibodies have decreased or disappeared -could potentially explain the low ability of the antibody tests to identify people with COVID-19. According to kinetic measurements of some of the included studies 22, 49, 54 IgM peaks between days 5 and 12 and then drops slowly. IgGs reach peak concentrations after day 20 or so as IgM antibodies disappear. This meta-analysis showed, through meta-regression, that IgG tests did have better sensitivity when the samples were taken longer after the onset of symptoms. This is further corroborated by the lower specificity of IgM antibodies compared to IgG 15. Only few of the included studies provided data stratified by the time of onset of symptoms, so a separate stratified analysis was not feasible, but this should be a goal for future studies.

This is an important knowledge gap. Timing really matters, because tests that aren’t sensitive to early asymptomatic transmission have limited utility for preventing spread. Consider the distribution of serial infection times (Ferretti et al., Science):

Testing by itself doesn’t do anything to reduce the spread of infection. It’s an enabler: transmission goes down only if coronavirus-positive individuals identified through testing change their behavior. That implies a chain of delays:

  • Conduct the test and get the results
  • Inform the positive person
  • Get them into a situation where they won’t infect their coworkers, family, etc.
  • Trace their contacts, test them, and repeat

A test that only achieves peak sensitivity at >5 days may not leave much time for these delays to play out, limiting the effectiveness of contact tracing and isolation. A test that peaks at day 20 would be pretty useless (though interesting for surveillance and other purposes).

Consider Long et al., Antibody responses to SARS-CoV-2 in COVID-19 patients: the perspective application of serological tests in clinical practice:

Seroconversion rates of 30% at onset of symptoms seem problematic, given the significant pre-symptomatic transmission implied by the Ferretti, Liu & Nishiura results on serial infection times. I hope the US testing strategy relies on lots of fast tests, not just lots of tests.

See also:

Potential Rapid Diagnostics, Vaccine and Therapeutics for 2019 Novel Coronavirus (2019-nCoV): A Systematic Review.

Antibody surveys suggesting vast undercount of coronavirus infections may be unreliable in Science

h/t Yioryos Stamboulis

A coronavirus prediction you can bank on

How many cases will there be on June 1? Beats me. But there’s one thing I’m sure of.

My confidence bounds on future behavior of the epidemic are still pretty wide. While there’s good reason to be optimistic about a lot of locations, there are also big uncertainties looming. No matter how things shake out, I’m confident in this:

The antiscience crowd will be out in force. They’ll cherry-pick the early model projections of an uncontrolled epidemic, and use that to claim that modelers predicted a catastrophe that didn’t happen, and conclude that there was never a problem. This is the Cassandra’s curse of all successful modeling interventions. (See Nobody Ever Gets Credit for Fixing Problems that Never Happened for a similar situation.)

But it won’t stop there. A lot of people don’t really care what the modelers actually said. They’ll just make stuff up. Just today I saw a comment at the Bozeman Chronicle to the effect of, “if this was as bad as they said, we’d all be dead.” Of course that was never in the cards, or the models, but that doesn’t matter in Dunning Krugerland.

Modelers, be prepared for a lot more of this. I think we need to be thinking more about defensive measures, like forecast archiving and presentation of results only with confidence bounds attached. However, it’s hard to do that and to produce model results at a pace that keeps up with the evolution of the epidemic. That’s something we need more infrastructure for.

Coronavirus Curve-fitting OverConfidence

This is a follow-on to The Normal distribution is a bad COVID19 model.

I understand that the IHME model is now more or less the official tool of the Federal Government. Normally I’m happy to see models guiding policy. It’s better than the alternative: would you fly in a plane designed by lawyers? (Apparently we have been.)

However, there’s nothing magic about a model. Using flawed methods, bad data, the wrong boundary, etc. can make the results GIGO. When a bad model blows up, the consequences can be just as harmful as any other bad reasoning. In addition, the metaphorical shrapnel hits the rest of us modelers. Currently, I’m hiding in my foxhole.

On top of the issues I mentioned previously, I think there are two more problems with the IHME model:

First, they fit the Normal distribution to cumulative cases, rather than incremental cases. Even in a parallel universe where the nonphysical curve fit was optimal, this would lead to understatement of the uncertainty in the projections.

Second, because the model has no operational mapping of real-world concepts to equation structure, you have no hooks to use to inject policy changes and the uncertainty associated with them. You have to construct some kind of arbitrary index and translate that to changes in the size and timing of the peak in an unprincipled way. This defeats the purpose of having a model.

For example, from the methods paper:

A covariate of days with expected exponential growth in the cumulative death rate was created using information on the number of days after the death rate exceeded 0.31 per million to the day when different social distancing measures were mandated by local and national government: school closures, non-essential business closures including bars and restaurants, stay-at-home recommendations, and travel restrictions including public transport closures. Days with 1 measure were counted as 0.67 equivalents, days with 2 measures as 0.334 equivalents and with 3 or 4 measures as 0.

This postulates a relationship that has only the most notional grounding. There’s no concept of compliance, nor any sense of the effect of stringency and exceptions.

In the real world, there’s also no linear relationship between “# policies implemented” and “days of exponential growth.” In fact, I would expect this to be extremely nonlinear, with a threshold effect. Either your policies reduce R0 below 1 and the epidemic peaks and shrinks, or they don’t, and it continues to grow at some positive rate until a large part of the population is infected. I don’t think this structure captures that reality at all.

That’s why, in the IHME figure above (retrieved yesterday), you don’t see any scenarios in which the epidemic fizzles, because we get lucky and warm weather slows the virus, or there are many more mild cases than we thought. You also don’t see any runaway scenarios in which measures fail to bring R0 below 1, resulting in sustained growth. Nor is there any possibility of ending measures too soon, resulting in an echo.

For comparison, I ran some sensitivity runs my model for North Dakota last night. I included uncertainty from fit to data (for example, R0 constrained to fit observations via MCMC) and some a priori uncertainty about effectiveness and duration of measures, and from the literature about fatality rates, seasonality, and unobserved asymptomatics.

I found that I couldn’t exclude the IHME projections from my confidence bounds, so they’re not completely crazy. However, they understate the uncertainty in the situation by a huge margin. They forecast the peak at a fairly definite time, plus or minus a factor of two. With my hybrid-SEIR model, the 95% bounds include variation by a factor of 10. The difference is that their bounds are derived only from curve fitting, and therefore omit a vast amount of structural uncertainty that is represented in my model.

Who is right? We could argue, but since the IHME model is statistically flawed and doesn’t include any direct effect of uncertainty in R0, prevalence of unobserved mild cases, temperature sensitivity of the virus, effectiveness of measures, compliance, travel, etc., I would not put any money on the future remaining within their confidence bounds.