Election Fraud and Benford’s Law

Statistical tests only make sense when the assumed distribution matches the data-generating process.

There are several analyses going around that purport to prove election fraud in PA, because the first digits of vote counts don’t conform to Benford’s Law. Here’s the problem: first digits of vote counts aren’t expected to conform to Benford’s Law. So, you might just as well say that election fraud is proved by Newton’s 3rd Law or Godwin’s Law.

Example of bogus conclusions from naive application of Benford’s Law.

Benford’s Law describes the distribution of first digits when the set of numbers evaluated derives from a scale-free or Power Law distribution spanning multiple orders of magnitude. Lots of processes generate numbers like this, including Fibonacci numbers and things that grow exponentially. Social networks and evolutionary processes generate Zipf’s Law, which is Benford-conformant.

Here’s the problem: vote counts may not have this property. Voting district sizes tend to be similar and truncated above (dividing a jurisdiction into equal chunks), and vote proportions tend to be similar due to gerrymandering and other feedback processes. This means the Benford’s Law assumptions are violated, especially for the first digit.

This doesn’t mean the analysis can’t be salvaged. As a check, look at other elections for the same region. Check the confidence bounds on the test, rather than simply plotting the sample against expectations. Examine the 2nd or 3rd digits to minimize truncation bias. Best of all, throw out Benford and directly simulate a distribution of digits based on assumptions that apply to the specific situation. If what you’re reading hasn’t done these things, it’s probably rubbish.

This is really no different from any other data analysis problem. A statistical test is meaningless, unless the assumptions of the test match the phenomena to be tested. You can’t look at lightning strikes the same way you look at coin tosses. You can’t use ANOVA when the samples are non-Normal, or have unequal variances, because it assumes Normality and equivariance. You can’t make a linear fit to a curve, and you can’t ignore dynamics. (Well, you can actually do whatever you want, but don’t propose that the results mean anything.)

Climate Skeptics in Search of Unity

The most convincing thing about mainstream climate science is not that the models are so good, but that the alternatives are so bad.

Climate skeptics have been at it for 40 years, but have produced few theories or predictions that have withstood the test of time. Even worse, where there were once legitimate measurement issues and model uncertainties to discuss, as those have fallen one by one, the skeptics are doubling down on theories that rely on “alternative” physics. The craziest ideas get the best acronyms and metaphors. The allegedly skeptical audience welcomes these bizarre proposals with enthusiasm. As they turn inward, they turn on each other.

The latest example is in the Lungs of Gaia at WUWT:

A fundamental concept at the heart of climate science is the contention that the solar energy that the disk of the Earth intercepts from the Sun’s irradiance must be diluted by a factor of 4. This is because the surface area of a globe is 4 times the interception area of the disk silhouette (Wilde and Mulholland, 2020a).

This geometric relationship of divide by 4 for the insolation energy creates the absurd paradox that the Sun shines directly onto the surface of the Earth at night. The correct assertion is that the solar energy power intensity is collected over the full surface area of a lit hemisphere (divide by 2) and that it is the thermal radiant exhaust flux that leaves from the full surface area of the globe (divide by 4).

Setting aside the weird pedantic language that seems to infect those with Galileo syndrome, these claims are simply a collection of errors. The authors seem to be unable to understand the geometry of solar flux, even though this is taught in first-year physics.

Some real college physics (divide by 4).

The “divide by 4” arises because the solar flux intercepted by the earth is over an area pi*r^2 (the disk of the earth as seen from the sun) while the average flux normal to the earth’s surface is over an area 4*pi*r^2 (the area of a sphere).

The authors’ notion of “divide by 2” resulting in 1368/2 = 684 w/m^2 average is laughable because it implies that the sun is somehow like a luminous salad bowl that delivers light at 1368 w/m^2 normal to the surface of one side of the earth only. That would make for pretty interesting sunsets.

In any case, none of this has much to do with the big climate models, which don’t “dilute” anything, because they have explicit geometry of the earth and day/night cycles with small time steps. So, all of this is already accounted for.

To his credit, Roy Spencer – a hero of the climate skeptics movement of the same magnitude as Richard Lindzen – arrives early to squash this foolishness:

How can some people not comprehend that the S/4 value of solar flux does NOT represent the *instantaneous* TOA illumination of the whole Earth, but instead the time-averaged (1-day or longer) solar energy available to the whole Earth. There is no flat-Earth assumption involved (in fact, dividing by 4 is because the Earth is approximately spherical). It is used in only simplistic treatments of Earth’s average energy budget. Detailed calculations (as well as 4D climate models as well as global weather forecast models) use the full day-night (and seasonal) cycle in solar illumination everywhere on Earth. The point isn’t even worth arguing about.

Responding to the clueless authors:

Philip Mulholland, you said: “Please confirm that the TOA solar irradiance value in a climate model cell follows the full 24 hour rotational cycle of daytime illumination and night time darkness.”

Oh, my, Philip… you cannot be serious.

Every one of the 24+ climate models run around the world have a full diurnal cycle at every gridpoint. This is without question. For example, for models even 20+ years ago start reading about the diurnal cycles in the models on page 796 of the following, which was co-authored by representatives from all of the modeling groups: https://www.ipcc.ch/site/assets/uploads/2018/02/WG1AR5_Chapter09_FINAL.pdf

Finally:

Philip, Ed Bo has hit the nail on the head. Your response to him suggests you do not understand even the basics of climate modeling, and I am a little dismayed that your post appeared on WUWT.

Undeterred, the WUWT crowd then proceeds to savage anyone, including their erstwhile hero Spencer, who dares to challenge the new “divide by 2” orthodoxy.

Dr roy with his fisher price cold warms hot physics tried to hold the line for the luke-warmers, but soon fecked off when he knew he would be embarrassed by the grown-ups in the room…..

This is not the first time a WUWT post has claimed to overturn climate science. There are others, like the 2011 Unified Theory of Climate. It’s basically technobabble, notable primarily for its utter obscurity in the nine years following. It’s not really worth analyzing, though I am a little curious how a theory driven by static atmospheric mass explains dynamics. Also, I notice that the perfect fit to the data for 7 planets in Fig. 5 has 7 parameters – ironic, given that accusations of overparameterization are a perennial favorite of skeptics. Amusingly, one of the authors of the “divide by two” revolution (Wilde) appears in the comments to point out his alternative “Unifying” Theory of Climate.

Are these alternate theories in agreement, mutually exclusive, or just not even wrong? It would be nice if skeptics would get together and decide which of their grand ideas is the right one. Does atmospheric pressure run the show, or is it sunspots? And which fundamentals that mathematicians and physicists screwed up have eluded verification for all these years? Is it radiative transfer, or the geometry of spheres and disks? Is energy itself misdefined? Inquiring minds want to know.

The bottom line is that Roy Spencer is right. It isn’t worth arguing about these things, any more than its worth arguing with flat earthers or perpetual motion enthusiasts. Engaging will just leave you wondering if proponents are serious, as in seriously deluded, or just yanking your chain while keeping a straight face.

 

Confusing the decision rule with the system

In the NYT:

To avoid quarantining students, a school district tries moving them around every 15 minutes.

Oh no.

To reduce the number of students sent home to quarantine after exposure to the coronavirus, the Billings Public Schools, the largest school district in Montana, came up with an idea that has public health experts shaking their heads: Reshuffling students in the classroom four times an hour.

The strategy is based on the definition of a “close contact” requiring quarantine — being within 6 feet of an infected person for 15 minutes or more. If the students are moved around within that time, the thinking goes, no one will have had “close contact” and be required to stay home if a classmate tests positive.

For this to work, there would have to be a nonlinearity in the dynamics of transmission. For example, if the expected number of infections from 2 students interacting with an infected person for 10 minutes each were less than the number from one student interacting with an infected person for 20 minutes, there might be some benefit. This would be similar to a threshold in a dose-response curve. Unfortunately, there’s no evidence for such an effect – if anything, increasing the number of contacts by fragmentation makes things worse.

Scientific reasoning has little to do with the real motivation:

Greg Upham, the superintendent of the 16,500-student school district, said in an interview that contact tracing had become a huge burden for the district, and administrators were looking for a way to ease the burden when they came up with the movement idea. It was not intended to “game the system,” he said, but rather to encourage the staff to be cognizant of the 15-minute window.

Regardless of the intent, this is absolutely an example of gaming the system. However, you game rules, but you can’t fool mother nature. The 15-minute window is a decision rule for prioritizing contact tracing, invented in the context of normal social mixing. Administrators have confused it with a physical phenomenon. Whether or not they intended to game the system, they’re likely to get what they want: less contact tracing. This makes the policy doubly disastrous: it increases transmission, and it diminishes the preventive effects of contact tracing and quarantine. In short order, that means more infections. A few doublings of cases will quickly overwhelm any reduction in contact tracing burden from shuffling students around.

I think the administrators who came up with this might want to consider adding systems thinking to the curriculum.

 

Should Systems Thinkers be on Social Media?

Using social media is a bit like dining out these days. You get some tasty gratification and social interaction, but you might catch something nasty, or worse, pass it along. I use Facebook and Twitter to get the word out on my work, and I learn interesting things, but these media are also the source of 90% of my exposure to fake news, filter bubbles, FOMO, AI bias, polarization, and rank stupidity.

If my goal is to make the world a better place by sharing insights about systems, is social media a net positive? Are there particular ways to engage that could make it a win? Since we can’t turn off the system, how do we coax it into working better for us? This causal loop diagram represents my preliminary thinking about some of the issues.

I think there are three key points.

First, social media is not really different from offline movements, or the internet as a whole. It’s just one manifestation. Like others, it is naturally primed to grow, due to positive feedback. Networks confer benefits to members that increase with scale, and networks reinvest in things that make the network more attractive. This is benign and universal (at least until the network uses AI to weaponize users’ information against them). These loops are shown in blue.

Second, there are good reasons to participate. By sharing good content, I can assist the diffusion of knowledge about systems, which helps people to manage the world. In addition, I get personal rewards for doing so, which increases my ability to do more of the same in the future. (Green loop.) There are probably also some rewards to the broader systems thinking community from enhanced ability to share information and act coherently.

But the dark side is that the social media ecosystem is an excellent growth medium for bad ideas and bad people who profit from them. Social platforms have no direct interest in controlling this, because they make as much money from an ad placed by Russian bots as they do from a Nike ad. Worse, they may actively oppose measures to control information pollution by capturing regulators and legislators. (Red loops.)

So far, I’m finding that the structure of the problem – a nest of good and evil positive feedback loops – makes it very hard to decide which effects will win out. Are we getting leverage from a system that helps share good ideas, or merely feeding a monster that will ultimately devour us? The obvious way to find out is to develop a more formal model, but that’s a rather time consuming endeavor. So, what do you think? Retire from the fray? Find a better outlet? Put the technology to good use? Where’s the good work in this area?

MSU Covid Evaluation

Well, my prediction of 10/9 covid cases at MSU, made on 10/6 using 10/2 data, was right on the money: I extrapolated 61 from cumulative cases, and the actual number was 60. (I must have made a typo or mental math error in reporting the expected cumulative cases, because 157+61 <> 207. The number I actually extrapolated was 157*e^.33 = 218 = 157 + 61.)

That’s pretty darn good, though I shouldn’t take too much credit, because my confidence bounds would have been wide, had I included them in the letter. Anyway, it was a fairly simpleminded exercise, far short of calibrating a real model.

Interestingly, the 10/16 release has 65 new cases, which is lower than the next simple extrapolation of 90 cases. However, Poisson noise in discrete events like this is large (the variance equals the mean, so this result is about two and a half standard deviations low), and we still don’t know how much testing is happening. I would still guess that case growth is positive, with R above 1, so it’s still an open question whether MSU will make it to finals with in-person classes.

Interestingly, the increased caseload in Gallatin County means that contact tracing and quarantine resources are now strained. This kicks off a positive feedback: increased caseload means that fewer contacts are traced and quarantined. That in turn means more transmission from infected people in the wild, further increasing caseload. MSU is relying on county resources for testing and tracing, so presumably the university is caught in this loop as well.

 

 

MSU Covid – what will tomorrow bring?

The following is a note I posted to a local listserv earlier in the week. It’s an example of back-of-the-envelope reasoning informed by experience with models, but without actually calibrating a model to verify the results. Often that turns out badly. I’m posting this to archive it for review and discussion later, after new data becomes available (as early as tomorrow, I expect).

I thought about responding to this thread two weeks ago, but at the time numbers were still very low, and data was scarce. However, as an MSU parent, I’ve been watching the reports closely. Now the picture is quite different.

If you haven’t discovered it, Gallatin County publishes MSU stats at the end of the weekly Surveillance Report, found here:

https://www.healthygallatin.org/about-us/press-releases/

For the weeks ending 9/10, 9/17, 9/24, and 10/2, MSU had 3, 7, 66, and 43 new cases. Reported active cases are slightly lower, which indicates that the active case duration is less than a week. That’s inconsistent with the two-week quarantine period normally recommended. It’s hard to see how this could happen, unless quarantine compliance is low or delays cause much of the infectious period to be missed (not good either way).

The huge jump two weeks ago is a concern. That’s growth of 32% per day, faster than the typical uncontrolled increase in the early days of the epidemic. That could happen from a superspreader event, but more likely reflects insufficient testing to detect a latent outbreak.

Unfortunately they still don’t publish the number of tests done at MSU, so it’s hard to interpret any of the data. We know the upper bound, which is the 2000 or so tests per week reported for all of Gallatin county. Even if all of those were dedicated to MSU, it still wouldn’t be enough to put a serious dent in infection through testing, tracing and isolation. Contrast this with Colby College, which tests everyone twice a week, which is a test density about 100x greater than Gallatin County+MSU.

In spite of the uncertainty, I think it’s wrong to pin Gallatin County’s increase in cases on MSU. First, COVID prevalence among incoming students was unlikely to be much higher than in the general population. Second, Gallatin County is much larger than MSU, and students interact largely among themselves, so it would be hard for them to infect the broad population. Third, the county has its own reasons for an increase, like reopening schools. Depending on when you start the clock, MSU cases are 18 to 28% of the county total, which is at worst 50% above per capita parity. Recently, there is one feature of concern – the age structure of cases (bottom of page 3 of the surveillance report). This shows that the current acceleration is driven by the 10-19 and 20-29 age groups.

As a wild guess, reported cases might understate the truth by a factor of 10. That would mean 420 active cases at MSU when you account for undetected asymptomatics and presymptomatic untested contacts. That’s out of a student/faculty population of 20,000, so it’s roughly 2% prevalence. A class of 10 would have a 1/5 chance of a positive student, and for 20 it would be 1/3. But those #s could easily be off by a factor of 2 or more.

Just extrapolating the growth rate (33%/week for cumulative cases), this Friday’s report would be for 61 new cases, 207 cumulative. If you keep going to finals, the cumulative would grow 10x – which basically means everyone gets it at some point, which won’t happen. I don’t know what quarantine capacity is, but suppose that MSU can handle a 300-case week (that’s where things fell apart at UNC). If so, the limit is reached in less than 5 weeks, just short of finals.

I’d say these numbers are discouraging. As a parent, I’m not yet concerned enough to pull my kids out, but they’re nonresidential so their exposure is low. Around classrooms on campus, compliance with masks, sanitizing and distancing is very good – certainly better than it is in town. My primary concern at present is that we don’t know what’s going on, because the published statistics are insufficient to make reliable judgments. Worse, I suspect that no one knows what’s going on, because there simply isn’t enough testing to tell. Tests are pretty cheap now, and the disruption from a surprise outbreak is enormous, so that seems penny wise and pound foolish. The next few weeks will reveal whether we are seeing random variation or the beginning of a large outbreak, but it would be far better to have enough surveillance and data transparency to know now.