Eugenics rebooted – what could go wrong?

Does DNA IQ testing create a meritocracy, or merely reinforce existing biases?

Technology Review covers new efforts to use associations between DNA and IQ.

… Intelligence is highly heritable and predicts important educational, occupational and health outcomes better than any other trait. Recent genome-wide association studies have successfully identified inherited genome sequence differences that account for 20% of the 50% heritability of intelligence. These findings open new avenues for research into the causes and consequences of intelligence using genome-wide polygenic scores that aggregate the effects of thousands of genetic variants.

The new genetics of intelligence

Robert Plomin and Sophie von Stumm

I have no doubt that there’s much to be learned here. However, research is not all they’re proposing:

IQ GPSs will be used to predict individuals’ genetic propensity to learn, reason and solve problems, not only in research but also in society, as direct-to-consumer genomic services provide GPS information that goes beyond single-gene and ancestry information. We predict that IQ GPSs will become routinely available from direct-to-consumer companies along with hundreds of other medical and psychological GPSs that can be extracted from genome-wide genotyping on SNP chips. The use of GPSs to predict individuals’ genetic propensities requires clear warnings about the probabilistic nature of these predictions and the limitations of their effect sizes (BOX 7).

Although simple curiosity will drive consumers’ interests, GPSs for intelligence are more than idle fortune telling. Because intelligence is one of the best predictors of educational and occupational outcomes, IQ GPSs will be used for prediction from early in life before intelligence or educational achievement can be assessed. In the school years, IQ GPSs could be used to assess discrepancies between GPSs and educational achievement (that is, GPS-based overachievement and underachievement). The reliability, stability and lack of bias of GPSs make them ideal for prediction, which is essential for the prevention of problems before they occur. A ‘precision education’ based on GPSs could be used to customize education, analogous to ‘precision medicine’

There are two ways “precision education” might be implemented. An egalitarian model would use information from DNA IQ measurements to customize resource allocations, so that all students could perform up to some common standard:

An efficiency model, by contrast, would use IQ measurements to set achievement expectations for each student, and customize resources to ensure that students who are underperforming relative to their DNA get a boost:

This latter approach is essentially a form of tracking, in which DNA is used to get an early read on who’s destined to flip bonds, and who’s destined to flip burgers.

One problem with this scheme is noise (as the authors note, seemingly contradicting their own abstract’s claim of reliability and stability). Consider the effect of a student receiving a spuriously low DNA IQ score. Under the egalitarian scheme, they receive more educational resources (enabling them to overperform), while under the efficiency scheme, resources would be lowered, leading self-fulfillment of the predicted low performance. The authors seem to regard this as benign and self-correcting:

By contrast, GPSs are ‘less dangerous’ because they are intrinsically probabilistic, not hardwired and deterministic like single-gene disorders. It is important to recall here that although all complex traits are heritable, none is 100% heritable. A similar logic can be applied to IQ scores: although they have great predic­tive validity for key life outcomes, IQ is not determin­istic but probabilistic. In short, an individual is always more than the sum of their genes or their IQ scores.

I think this might be true when you consider the local effects on the negative loops governing resource allocation. But I don’t think that remains true when you put it in context. Education is a nest of positive feedbacks. This creates path dependence that amplifies errors in resource allocation, whether they come from subjective teacher impressions or DNA measurements.

In a perfect world, DNA-IQ provides an independent measurement that’s free of those positive feedbacks. In that sense, it’s perfectly meritocratic:

But how do you decide what to measure? Are the measurements good, or just another way to institutionalize bias? This is hotly contested. Let’s suppose that problems of gender and race/ethnicity bias have been, or can be solved. There are still questions about what measurements correlate with better individual or societal outcomes. At some point, implicit or explicit choices have to be made, and these are not value-free. They create reinforcing feedbacks:

I think it’s inevitable that, like any other instrument, DNA IQ scores are going to reflect the interests of dominant groups in society. (At a minimum, I’d be willing to bet that IQ tests don’t measure things that would result in low scores for IQ test designers.) If that means more Einsteins, Bachs and Ghandis, maybe it’s OK. But I don’t think that’s guaranteed to lead to a good outcome. First, there’s no guarantee that a society composed of apparently high-performing individuals is in itself high-performing. Second, the dominant group may be dominant, not by virtue of faster CPUs in their heads, but something less appetizing.

I think there’s no guarantee that DNA IQ will not reflect attributes that are dysfunctional for society. We would hate to inadvertently produce more Stalins and Mengeles by virtue of inadvertent correlations with high achievement of less virtuous origin. And certainly, like any instrument used for high-stakes decisions, the pressure to distort and manipulate results will increase with use.

Note that if education is really egalitarian, the link between Measured IQ and Educational Resources Allocated reverses polarity, becoming negative. Then the positive loops become negative loops, and a lot of these problems go away. But that’s not often a choice societies make, presumably because egalitarian education is in itself contrary to the interests of dominant groups.

I understand researchers’ optimism for this technology in the long run. But for now, I remain wary, due to the decided lack of systems thinking about the possible side effects. In similar circumstances, society has made poor choices about teacher value added modeling, easily negating any benefits it might have had. I’m expecting a similar outcome here.

Loopy

I just gave Loopy a try, after seeing Gene Bellinger’s post about it.

It’s cool for diagramming, and fun. There are some clever features, like drawing a circle to create a node (though I was too dumb to figure that out right away). Its shareability and remixing are certainly useful.

However, I think one must be very cautious about simulating causal loop diagrams directly. A causal loop diagram is fundamentally underspecified, which is why no method of automated conversion of CLDs to models has been successful.

In this tool, behavior is animated by initially perturbing the system (e.g, increase the number of rabbits in a predator-prey system). Then you can follow the story around a loop via animated arrow polarity changes – more rabbits causes more foxes, more foxes causes less rabbits. This is essentially the storytelling method of determining loop polarity, which I’ve used many times to good effect.

However, as soon as the system has multiple loops, you’re in trouble. Link polarity tells you the direction of change, but not the gain or nonlinearity. So, when multiple loops interact, there’s no way to determine which is dominant. Also, in a real system it matters which nodes are stocks; it’s not sufficient to assume that there must be at least one integration somewhere around a loop.

You can test this for yourself by starting with the predator-prey example on the home page. The initial model is a discrete oscillator (more rabbits -> more foxes -> fewer rabbits). But the real system is nonlinear, with oscillation and other possible behaviors, depending on parameters. In Loopy, if you start adding explicit births and deaths, which should get you closer to the real system, simulations quickly result in a sea of arrows in conflicting directions, with no way to know which tendency wins. So, the loop polarity simulation could be somewhere between incomprehensible and dead wrong.

Similarly, if you consider an SIR infection model, there are three loops of interest: spread of infection by contact, saturation from running out of susceptibles, and recovery of infected people. Depending on the loop gains, it can exhibit different behaviors. If recovery is stronger than spread, the infection dies out. If spread is initially stronger than recovery, the infection shifts from exponential growth to goal seeking behavior as dominance shifts nonlinearly from the spread loop to the saturation loop.

I think it would be better if the tool restricted itself to telling the story of one loop at a time, without making the leap to system simulations that are bound to be incorrect in many multiloop cases. With that simplification, I’d consider this a useful item in the toolkit. As is, I think it could be used judiciously for explanations, but for conceptualization it seems likely to prove dangerous.

My mind goes back to Barry Richmond’s approach to systems here. Causal loop diagrams promote thinking about feedback, but they aren’t very good at providing an operational description of how things work. When you’re trying to figure out something that you don’t understand a priori, you need the bottom-up approach to synthesize the parts you understand into the whole you’re grasping for, so you can test whether your understanding of processes explains observed behavior. That requires stocks and flows, explicit goals and actual states, and all the other things system dynamics is about. If we could get to that as elegantly as Loopy gets to CLDs, that would be something.

Tasty Menu

From the WPI online graduate program and courses in system dynamics:

Truly a fine lineup!

Not even wrong: a school board's discussion of systems thinking

Socialism. Communism. “Nazism.” American Exceptionalism. Indoctrination. Buddhism. Meditation. “Americanism.” These are not words or terms one would typically expect to hear in a Winston-Salem/Forsyth County School Board meeting. But in the Board’s last meeting on October 9th, they peppered the statements of public commenters and Board Members alike.

The object of this invective? Systems thinking. You really have to read part 1 and part 2 of Camel City Dispatch’s article to get an appreciation for the school board’s discussion of the matter.

I know that, as a systems thinker, I should look for the unstated assumptions that led board members to their critiques, and establish a constructive dialog. But I just can’t do it – I have to call out the fools. While there are some voices of reason, several of the board members and commenters apparently have no understanding of the terms they bandy about, and have no business being involved in the education of anyone, particularly children.

The low point of the exchange:

Jeannie Metcalf said she “will never support anything that has to do with Peter Senge… I don’t care what [the teachers currently trained in System’s Thinking] are teaching. I don’t care what lessons they are doing. He’s is trying to sell a product. Once it insidiously makes its way into our school system, who knows what he’s going to do. Who knows what he’s going to do to carry out his Buddhist way of thinking and his hatred of Capitalism. I know y’all are gonna be thinkin’ I’m a crazy person, but I’ve been around a long time.”

Yep, you’re crazy all right. In your imaginary parallel universe, “hatred of capitalism” must be a synonym for writing one of the most acclaimed business books ever, sitting at one of the best business schools in the world, and consulting at the highest levels of many Fortune 50 companies.

The common thread among the ST critics appears to be a total failure to actually observe classrooms combined with shoot-the-messenger reasoning from consequences. They see, or imagine, a conclusion that they don’t like, something that appears vaguely environmental or socialist, and assume that it must be part of the hidden agenda of the curriculum. In fact, as supporters pointed out, ST is a method, which could as easily be applied to illustrate the benefits of individualism, markets, or whatnot, as long as they are logically consistent. Of course, if one’s pet virtue has limits or nuances, ST may also reveal those – particularly when simulation is used to formalize arguments. That is what the critics are really afraid of.

Kon-Tiki & the STEM workforce

I don’t know if Thor Heyerdahl had Polynesian origins or Rapa Nui right, but he did nail the stovepiping of thinking in organizations:

“And there’s another thing,” I went on.
“Yes,” said he. “Your way of approaching the problem. They’re specialists, the whole lot of them, and they don’t believe in a method of work which cuts into every field of science from botany to archaeology. They limit their own scope in order to be able to dig in the depths with more concentration for details. Modern research demands that every special branch shall dig in its own hole. It’s not usual for anyone to sort out what comes up out of the holes and try to put it all together.

Carl was right. But to solve the problems of the Pacific without throwing light on them from all sides was, it seemed to me, like doing a puzzle and only using the pieces of one color.

Thor Heyerdahl, Kon-Tiki

This reminds me of a few of my consulting experiences, in which large firms’ departments jealously guarded their data, making global understanding or optimization impossible.

This is also common in public policy domains. There’s typically an abundance of micro research that doesn’t add up to much, because no one has bothered to build the corresponding macro theory, or to target the micro work at the questions you need to answer to build an integrative model.

An example: I’ve been working on STEM workforce issues – for DOE five years ago, and lately for another agency. There are a few integrated models of workforce dynamics – we built several, the BHEF has one, and I’ve heard of efforts at several aerospace firms and agencies like NIH and NASA. But the vast majority of education research we’ve been able to find is either macro correlation studies (not much causal theory, hard to operationalize for decision making) or micro examination of a zillion factors, some of which must really matter, but in a piecemeal approach that makes them impossible to integrate.

An integrated model needs three things: what, how, and why. The “what” is the state of the system – stocks of students, workers, teachers, etc. in each part of the system. Typically this is readily available – Census, NSF and AAAS do a good job of curating such data. The “how” is the flows that change the state. There’s not as much data on this, but at least there’s good tracking of graduation rates in various fields, and the flows actually integrate to the stocks. Outside the educational system, it’s tough to understand the matrix of flows among fields and economic sectors, and surprisingly difficult even to get decent measurements of attrition from a single organization’s personnel records. The glaring omission is the “why” – the decision points that govern the aggregate flows. Why do kids drop out of science? What attracts engineers to government service, or the finance sector, or leads them to retire at a given age? I’m sure there are lots of researchers who know a lot about these questions in small spheres, but there’s almost nothing about the “why” questions that’s usable in an integrated model.

I think the current situation is a result of practicality rather than a fundamental philosophical preference for analysis over synthesis. It’s just easier to create, fund and execute standalone micro research than it is to build integrated models.

The bad news is that vast amounts of detailed knowledge goes to waste because it can’t be put into a framework that supports better decisions. The good news is that, for people who are inclined to tackle big problems with integrated models, there’s lots of material to work with and a high return to answering the key questions in a way that informs policy.

Algebra, Eroding Goals and Systems Thinking

A NY Times editorial wonders, Is Algebra Necessary?*

I think the short answer is, “yes.”

The basic point of having a brain is to predict the consequences of actions before taking them, particularly where those actions might be expensive or fatal. There are two ways to approach this:

  • pattern matching or reinforcement learning – hopefully with storytelling as a conduit for cumulative experience with bad judgment on the part of some to inform the future good judgment of others.
  • inference from operational specifications of the structure of systems, i.e. simulation, mental or formal, on the basis of theory.

If you lack a bit of algebra and calculus, you’re essentially limited to the first option. That’s bad, because a lot of situations require the second for decent performance.

The evidence the article amasses to support abandonment of algebra does not address the fundamental utility of algebra. It comes in two flavors:

  • no one needs to solve certain arcane formulae
  • setting the bar too high for algebra discourages large numbers of students

I think too much reliance on the second point risks creating an eroding goals trap. If you can’t raise the performance, lower the standard:

eroding goals
B. Jana, Wikimedia Commons, Creative Commons Attribution-Share Alike 3.0 Unported

This is potentially dangerous, particularly when you also consider that math performance is coupled with a lot of reinforcing feedback.

As an alternative to formal algebra, the editorial suggests more practical math,

It could, for example, teach students how the Consumer Price Index is computed, what is included and how each item in the index is weighted — and include discussion about which items should be included and what weights they should be given.

I can’t really fathom how one could discuss weighting the CPI in a meaningful way without some elementary algebra, so it seems to me that this doesn’t really solve the problem.

However, I think there is a bit of wisdom here. What earthly purpose does solving the quadratic formula serve, until one is able to map that to some practical problem space? There is growing evidence that even high-performing college students can manipulate symbols without gaining the underlying intuition needed to solve real-world problems.

I think the obvious conclusion is not that we should give up on teaching algebra, but that we should teach it quite differently. It should emerge as a practical requirement, motivated by a student-driven search for the secrets of life and systems thinking in particular.

* Thanks to Richard Dudley for pointing this out.

Is Algebra Necessary?

What drives learning?

Sit down and shut up while I tell you.

One interesting take on this compares countries cross-sectionally to get insight into performance drivers. A colleague dug up Educational Policy and Country Outcomes in International Cognitive Competence Studies. Two pictures from the path analysis are interesting:

Note the central role of discipline. Interestingly, the study also finds that self-report of pleasure reading is negatively correlated with performance. Perhaps that’s a consequence of getting performance through discipline rather than self-directed interest? (It works though.)

More interesting, though, is that practically everything is weak, except the educational level of society – a big positive feedback.

I find this sort of analysis quite interesting, but if I were a teacher, I think I’d be frustrated. In the aggregate international data, there’s precious little to go on when it comes to deciding, “what am I going to do in class today?”

 

 

Teacher value added modeling – my bottom line

I’m still attracted to the idea of objective measurements of teaching performance.* But I’m wary of what appear to be some pretty big limitations in current implementations.

It’s interesting reading the teacher comments on the LA Times’ teacher value added database, because many teachers appear to have a similar view – conceptually supportive, but wary of caveats and cognizant of many data problems. (Interestingly, the LAT ratings seem to have higher year-on-year and cross subject rating reliability, much more like I would expect a useful metric to behave. I can only browse incrementally though, so seeing the full dataset rather than individual samples might reveal otherwise.)

My takeaways on the value added measurements:

I think the bigger issues have more to do with the content of the value added measurements rather than their precision. There’s nothing mysterious about what teacher value added measures. It’s very explicitly the teacher-level contribution to year-on-year improvement in student standardized test scores. Any particular measurement might contain noise and bias, but if you could get rid of those, there are still some drawbacks to the metric.

  • Testing typically emphasizes only math and English, maybe science, and not art, music, and a long list of other things. This is broadly counterproductive for life, but also even narrowly for learning math and English, because you need some real-world subject matter in order to have interesting problems to solve and things to write about.
  • Life is a team sport. Teaching is, or should be, too. (If you doubt this, watch a few episodes of a reality show like American Chopper and ponder whether performance would be more enhanced by better algebra skills, or better cooperation, communication and project management skills. Then ponder whether building choppers is much different from larger enterprises, like the Chunnel.) We should be thinking about performance accordingly.
    • A focus at the student and teacher level ignores the fact that school system-level dynamics are most likely the biggest opportunity for improvement.
    • A focus on single-subject year-on-year improvements means that teachers are driven to make decisions with a 100% time discount rate, and similar total disregard for the needs of other teachers’ classes.**
  • Putting teachers in a measurement-obsessed command-and-control environment is surely not the best way to attract high-quality teachers.
  • It’s hard to see how putting every student through the same material at the same pace can be optimal.
  • It doesn’t make sense to put too much weight on standardized test scores, when the intersection between those and more general thinking/living skills is not well understood.

If no teachers are ever let go for poor performance, that probably signals a problem. In fact, it’s likely a bigger problem if teacher performance measurement (generally, not just VAM) is noisy, because bad teachers can get tenure by luck. If VAM helps with the winnowing process, that might be a useful function.

But it seems to me that the power of value added modeling is being wasted by this musical chairs*** mentality. The real challenge in teaching is not to decrease the stock of bad teachers. It’s to increase the stock of good ones, by attracting new ones, retaining the ones we have, and helping all of them learn to improve. Of course, that might require something more scarce than seats in musical chairs – money.


* A friend and school board member in semi-rural California was an unexpected fan of No Child Left Behind testing requirements, because objective measurements were the only thing that finally forced her district to admit that, well, they kind of sucked.

** A friend’s son, a math teacher, proposed to take a few days out of the normal curriculum to wrap up some loose ends from prior years. He thought this would help students to cement the understanding of foundational topics that they’d imperfectly mastered. Management answered categorically that there could be no departures from the current year material, needed to cover standardized test requirements. He defied them and did it, but only because he knew that it would take the district a year to fire him, and he was quitting anyway.

*** Musical chairs has to be one of the worst games you could possibly teach to children. We played it fairly regularly in elementary school.