AI is killing us now

I’ve been watching the debate over AI with some amusement, as if it were some other planet at risk. The Musk-Zuckerberg kerfuffle is the latest installment. Ars Technica thinks they’re both wrong:

At this point, these debates are largely semantic.

I don’t see how anyone could live through the last few years and fail to notice that networking and automation have enabled an explosion of fake news, filter bubbles and other information pathologies. These are absolutely policy relevant, and smarter AI is poised to deliver more of what we need least. The problem is here now, not from some impending future singularity.

Ars gets one point sort of right:

Plus, computer scientists have demonstrated repeatedly that AI is no better than its datasets, and the datasets that humans produce are full of errors and biases. Whatever AI we produce will be as flawed and confused as humans are.

I don’t think the data is really the problem; it’s the assumptions the data’s treated with and the context in which that occurs that’s really problematic. In any case, automating flawed aspects of ourselves is not benign!

Here’s what I think is going on:

AI, and more generally computing and networks are doing some good things. More data and computing power accelerate the discovery of truth. But truth is still elusive and expensive. On the other hand, AI is making bullsh!t really cheap (pardon the technical jargon). There are many mechanisms by which this occurs:

These amplifiers of disinformation serve increasingly concentrated wealth and power elites that are isolated from their negative consequences, and benefit from fueling the process. We wind up wallowing in a sea of information pollution (the deadliest among the sins of managing complex systems).

As BS becomes more prevalent, various reinforcing mechanisms start kicking in. Accepted falsehoods erode critical thinking abilities, and promote the rejection of ideas like empiricism that were the foundation of the Enlightenment. The proliferation of BS requires more debunking, taking time away from discovery. A general erosion of trust makes it harder to solve problems, opening the door for opportunistic rent-seeking non-solutions.

I think it’s a matter of survival for us to do better at critical thinking, so we can shift the balance between truth and BS. That might be one area where AI could safely assist. We have other assets as well, like the explosion of online learning opportunities. But I think we also need some cultural solutions, like better management of trust and anonymity, brakes on concentration, sanctions for lying, rewards for prediction, and more time for reflection.

The survival value of wrong beliefs

… reasons for the survival of antiscientific views. It’s basically a matter of evolution. When crazy ideas negatively affect survival, they die out. But evolutionary forces are vastly diminished under some conditions, or even point the wrong way …

NPR has an alarming piece on school science.

She tells her students — like Nick Gurol, whose middle-schoolers believe the Earth is flat — that, as hard as they try, science teachers aren’t likely to change a student’s misconceptions just by correcting them. Gurol says his students got the idea of a flat planet from basketball star Kyrie Irving, who said as much on a podcast.

“And immediately I start to panic. How have I failed these kids so badly they think the Earth is flat just because a basketball player says it?” He says he tried reasoning with the students and showed them a video. Nothing worked.

“They think that I’m part of this larger conspiracy of being a round-Earther. That’s definitely hard for me because it feels like science isn’t real to them.”

For cases like this, Yoon suggests teachers give students the tools to think like a scientist. Teach them to gather evidence, check sources, deduce, hypothesize and synthesize results. Hopefully, then, they will come to the truth on their own.

This called to mind a post from way back, in which I considered reasons for the survival of antiscientific views.

It’s basically a matter of evolution. When crazy ideas negatively affect survival, they die out. But evolutionary forces are vastly diminished under some conditions, or even point the wrong way:

  1. Non-experimental science (reliance on observations of natural experiments; no controls or randomized assignment)
  2. Infrequent replication (few examples within the experience of an individual or community)
  3. High noise (more specifically, low signal-to-noise ratio)
  4. Complexity (nonlinearity, integrations or long delays between cause and effect, multiple agents, emergent phenomena)
  5. “Unsalience” (you can’t touch, taste, see, hear, or smell the variables in question)
  6. Cost (there’s some social or economic penalty  imposed by the policy implications of the theory)
  7. Commons (the risk of being wrong accrues to society more than the individual)

These are, incidentally, some of the same circumstances that make medical trials difficult, such that most papers are false.

Consider the flat earth idea. What cost accrues to students who hold this belief? None whatsoever, I think. A flat earth model will make terrible predictions of all kinds of things, but students are not making or relying on such predictions. The roundness of the earth is obviously not salient. So really, the only survival value that matters to students is the benefit of tribal allegiance.

If there are intertemporal dynamics, the situation is even worse. For any resource or capability investment problem, there’s worse before better behavior. Recovering depleted fish stocks requires diminished effort, and less to eat, in the near term. If a correct belief implies good long run stock management, adherents of the incorrect belief will have an advantage in the short run. You can’t count on selection weeding out the “dumb tribes” for planetary-scale problems, because we’re all in one.

This seems like a pretty intractable problem. If there’s a way out, it has to be cultural. If there were a bit more recognition of the value on making correct predictions, the halo of that would spill over to diminish the attractiveness of silly theories. That’s a case that ought to be compelling for basketball fans. Who wants to play on a team that can’t predict what the opponents will do, or how the ball will bounce?

System 3 thinking

There was lots of talk of dual process theory at the 2017 System Dynamics Conference. Nelson Repenning discussed it in his plenary presentation. The Donella Meadows Award paper investigated the effects on stock-flow task performance of priming subjects to think in System 2:

The dual-process theory and understanding of stocks and flows

Arash Baghaei Lakeh and Navid Ghaffarzadegan

Recent evidence suggests that using the analytic mode of thinking (System 2) can improve people’s performance in stock–flow (SF) tasks. In this paper, we further investigate the effects by implementing several different interventions in two studies. First, we replicate a previous finding that answering analytical questions before the SF task approximately doubles the likelihood of answering the stock questions correctly. We also investigate effects of three other interventions that can potentially prime participants to use their System 2. Specifically, the first group is asked to justify their response to the SF task; the second group is warned about the difficulty of the SF task; and the third group is offered information about cognitive biases and the role of the analytic mode of thinking. We find that the second group showed a statistically significant improvement in their performance. We claim that there are simple interventions that can modestly improve people’s response in SF tasks.

Dual process refers to the idea that there are two systems of thinking at work in our minds. System 1 is fast, automatic intuition. System 2 is slow, rational reasoning.

I’ve lost track of the conversation, but some wag at the conference (not me; possibly Arash)  coined the term “System 3” for model-assisted thinking.

In a sense, any reasoning is “model-assisted,” but I think there’s an important distinction between purely mental reasoning and reasoning with a formal (usually computerized) modeling method like a dynamic simulation or even a spreadsheet.

When we reason in our heads, we have to simultaneously (A) describe the structure of the problem, (B) predict the behavior implied by the structure, and (C) test the structure against available information. Mentally, we’re pretty good at A, but pretty bad at B and C. No one can reliably simulate even a low-order dynamic system in their head, and there are too many model checks against data and thought experiments (like extreme conditions) to “run” without help.

System 3’s great weakness is that it takes still more time than using System 2. But it makes up for that in three ways. First, reliable predictions and tests of behavior reveal misconceptions about the problem/system structure that are otherwise inaccessible, so the result is higher quality. Second, the model is shareable, so it’s easier to convey insights to other stakeholders who need to be involved in a solution. Third, formal models can be reused, which lowers the effective cost of an application.

But how do you manage that “still more time” problem? Consider this advice:

I discovered a simple solution to making challenging choices more efficiently at an offsite last week with the CEO and senior leadership team of a high tech company. They were facing a number of unique, one-off decisions, the outcomes of which couldn’t be accurately predicted.

These are precisely the kinds of decisions which can linger for weeks, months, or even years, stalling the progress of entire organizations. …

But what if we could use the fact that there is no clear answer to make a faster decision?

“It’s 3:15pm,” He [the CEO] said. “We need to make a decision in the next 15 minutes.”

“Hold on,” the CFO responded, “this is a complex decision. Maybe we should continue the conversation at dinner, or at the next offsite.”

“No,” The CEO was resolute, “We will make a decision within the next 15 minutes.”

And you know what? We did.

Which is how I came to my third decision-making method: use a timer.

I’m in favor of using a timer to put a stop to dithering. Certainly a body with scarce time should move on when it perceives that it can’t add value. But this strikes me as a potentially costly reversion to System 1.

If a problem is strategic enough to make it to the board, but the board sees a landscape that prevents a clear decision, it ought to be straightforward to articulate why. Are there tradeoffs that make the payoff surface flat? The timer is a sensible response to that, because the decision doesn’t require precision. Are there competing feedback loops that suggest different leverage points, for which no one can agree about the gain? In that case, the consequence of an error could be severe, so the default answer should include a strategy for detection and correction. One ought to have a way to discriminate between these two situations, and a simple application of System 3 might be just the tool.


The intuitive mind is a gag gift

I saw Einstein quoted yesterday, “The intuitive mind is a sacred gift and the rational mind is a faithful servant. We have created a society that honors the servant and has forgotten the gift.”

I wondered what he meant, because I think of the intuitive mind as a treacherous friend. We can’t do without it, because we have too many decisions to make. You’d never get out of bed if everything had to be evaluated rationally. But at the same time, whatever heuristics are going on in there are the same ones that,

… and indulge in dozens of other biases. I think they’re also why Lightroom’s face recognition mixes me up with my dog.

How could Einstein revere intuition above reason? Perhaps he relished the intuitive guess at an equation, or some kind of Occam’s Razor argument about simplicity and beauty?

Well, it appears that the answer is simple, but not too simple. He didn’t say it.

Why so many incompetent leaders, period?

The HBR has a nice article asking, Why Do So Many Incompetent Men Become Leaders? Gender may amplify the problem, but I think its roots lie much deeper. We have a general surplus of incompetence across all walks of life.

So, how does this unhappy situation persist? One would hope that evolution would take care of this – that companies or nations that were systematically fooled by confidence over substance would be naturally selected out of the population. But that doesn’t seem to happen.

I think the explanation lies in the weaknesses of our mental models (and failure to refine them with formal models), and therefore our inability to attribute success and failure to decisions, in hindsight or prospects.

The HBR has a nice article asking, Why Do So Many Incompetent Men Become Leaders? Some excerpts:

In my view, the main reason for the uneven management sex ratio is our inability to discern between confidence and competence. That is, because we (people in general) commonly misinterpret displays of confidence as a sign of competence, we are fooled into believing that men are better leaders than women. In other words, when it comes to leadership, the only advantage that men have over women … is the fact that manifestations of hubris — often masked as charisma or charm — are commonly mistaken for leadership potential, and that these occur much more frequently in men than in women.

The truth of the matter is that pretty much anywhere in the world men tend to think that they that are much smarter than women. Yet arrogance and overconfidence are inversely related to leadership talent — the ability to build and maintain high-performing teams, and to inspire followers to set aside their selfish agendas in order to work for the common interest of the group.

The paradoxical implication is that the same psychological characteristics that enable male managers to rise to the top of the corporate or political ladder are actually responsible for their downfall. In other words, what it takes to get the job is not just different from, but also the reverse of, what it takes to do the job well. …

In fact, most leaders — whether in politics or business — fail. That has always been the case: the majority of nations, companies, societies and organizations are poorly managed, as indicated by their longevity, revenues, and approval ratings, or by the effects they have on their citizens, employees, subordinates or members. Good leadership has always been the exception, not the norm.

Gender may amplify the problem, but I think its roots lie much deeper. We have a general surplus of incompetence across all walks of life.

So, how does this unhappy situation persist? One would hope that evolution would take care of this – that companies or nations that were systematically fooled by confidence over substance would be naturally selected out of the population. But that doesn’t seem to happen.

I think the explanation lies in the weaknesses of our mental models (and failure to refine them with formal models), and therefore our inability to attribute success and failure to decisions, in hindsight or prospects.  Here’s the purest expression of this line of thinking I’ve seen:

Why I Switched My Endorsement from Clinton to Trump

1. Things I Don’t Know: There are many things I don’t know. For example, I don’t know the best way to defeat ISIS. Neither do you. I don’t know the best way to negotiate trade policies. Neither do you. I don’t know the best tax policy to lift all boats. Neither do you. …. So on most political topics, I don’t know enough to make a decision. Neither do you, but you probably think you do.

3. Party or Wake: It seems to me that Trump supporters are planning for the world’s biggest party on election night whereas Clinton supporters seem to be preparing for a funeral. I want to be invited to the event that doesn’t involve crying and moving to Canada. (This issue isn’t my biggest reason.)

Scott Adams, Dilbert creator

If you can’t predict which leader’s proposals or methods will work, why not go with the ones that sound the best? Or, if you can’t figure out how to grow the pie for everyone, why not at least choose the tribal affiliation that gives you the best chance at a slice of patrimony?

Still, at the end of the day, the honeymoon is over, and the effects of decisions should come home to roost, right? Not necessarily. Even after the fact, attribution of causality in dynamic systems is difficult, because causes and effects are separated in space in time. So, you can’t learn to do better by simple pattern matching; you have to understand the structure that’s producing behavior. Firm policies and election rules worsen the problem by rotating people around, so that they can launch initiatives and be gone before the consequences are observed, defeating evolution.

John Sterman and Nelson Repenning explain in another context:

The Capability Trap
The capability trap arises from the interactions between judg-
mental biases and the physical structure of work processes.
For example, machine operators or design engineers facing a
shortfall may initially work harder …, do more rework
…, or focus on throughput …, all of which
reduce the time available for improvement. These responses
are tempting because they yield immediate gains, while their
costs are distant in time and space, uncertain, and hard to
detect. But, while throughput improves in the short run, the
reduction in time dedicated to learning causes process capa-
bility to decline. Eventually, workers find themselves again
falling short of their throughput target, forcing a further shift
toward working and away from improving. Instead of making
up for the improvement activity they skipped earlier, their
own past actions, by causing the reinvestment loops … to work as vicious cycles, trap them in a downward
spiral of eroding process capability, increasing work hours,
and less and less time for improvement.

Misperceptions of Feedback
While the literature and field data support the links in the
model, our account of the capability trap raises several
questions. First, wouldn’t people recognize the existence of
the reinforcing feedbacks that create the trap and take
actions to avoid it? Second, if they find themselves stuck in
the trap, wouldn’t people learn to escape it by making appro-
priate short-term sacrifices? Studies of decision making in
dynamic environments suggest that such learning is far from
Consider the outcome feedback received from a decision to
spend more time working and less on improvement. Perfor-
mance quickly increases, producing a clear, salient, unam-
biguous outcome. In contrast, the negative consequences of
this action—the decline in process capability—take time, are
hard to observe, and may have ambiguous interpretations. In
experiments ranging from running a simulated production and
distribution system (Sterman, 1989) to fighting a simulated
forest fire (Brehmer, 1992) or managing a simulated fishery
(Moxnes, 1999), subjects have been shown to grossly over-
weight the short-run positive benefits of their decisions while
ignoring the long-run, negative consequences. Participants in
these experiments produce wildly oscillating production
rates, allow their fire-fighting headquarters to burn down, and
find their fleets idled after overexploiting their fisheries.

Once caught in the capability trap, people are also unlikely to
learn to escape it. A new improvement program, by reducing
the time available for throughput, causes an immediate and
salient drop in performance, while its benefits are uncertain,
delayed, difficult to assess, and may be insufficient to switch
the reinforcing feedbacks to virtuous cycles. People are likely
to conclude that the improvement program they attempted
does not work and should be abandoned.

Attribution Errors in Judging the Cause of Low Throughput
When choosing to emphasize first- or second-order improve-
ment, managers must make a judgment about the causes of
low process throughput. If they believe the cause of low per-
formance lies in the physical structure of the process, they
are likely to focus their efforts on process improvement. If,
however, low throughput is thought to result from lack of
worker effort or discipline, then managers are better off

focusing on increasing the quantity of work. The cues people

use to make causal attributions include temporal order,
covariation, and contiguity in time and space (Einhorn and
Hogarth, 1986). Attributing low throughput to inadequate
worker effort is consistent with all these cues: …. Managers are thus likely to attribute a throughput shortfall to inadequate worker effort, even when the true causes are systemic process
Managers’ tendency to attribute performance shortfalls to
problems with the workforce rather than the production sys-
tem is reinforced by the so-called fundamental attribution
error, or dispositional bias. …. Existing research thus suggests that managers facing throughput gaps are likely to conclude that workers, not the process, are the cause of low throughput, reinforcing the bias against fundamental improvement.
As Sterman & Repenning go on to explain, these attribution errors are likely to become self-confirming, and to be institutionalized in organizational routines, leading to self-reinforcing organizational pathologies.
Blaming workers for productivity shortfalls that ultimately arise from the firm leadership’s failure to focus on process improvement is a lot like blaming poverty on the shortcomings of poor people, rather than their social environment, which subjects them to poor education, predatory monopolies and disproportionate criminal and environmental burdens. Programs that focus exclusively on motivating (or punishing) the impoverished are at least as naive as those that seek to alleviate poverty through transfers of money without creating skills or opportunities.
Back to Scott Adams, one argument in favor of unfounded overconfidence remains:

6. Persuasion: Economies are driven by psychology. If you expect things to go well tomorrow, you invest today, which causes things to go well tomorrow, as long as others are doing the same. The best kind of president for managing the psychology of citizens – and therefore the economy – is a trained persuader. You can call that persuader a con man, a snake oil salesman, a carnival barker, or full of shit. It’s all persuasion. And Trump simply does it better than I have ever seen anyone do it.

Most of the job of president is persuasion. Presidents don’t need to understand policy minutia. They need to listen to experts and then help sell the best expert solutions to the public. Trump sells better than anyone you have ever seen, even if you haven’t personally bought into him yet. You can’t deny his persuasion talents that have gotten him this far.

Psychology is in steady state over any reasonably long time horizon, so it’s not really psychology that drives economic growth. Psychology is necessary, in that people have to feel that conditions are right for risk-taking, but it’s not sufficient. The real long run driver is innovation, embodied in people, technology and organizations. That means it’s also necessary that innovations work, so investments produce, GDP makes people happy, schools teach, infrastructure serves, and wars defeat more enemies than they create. Mere bluster does not get you those things.
So here’s the problem: the overconfidence that lends itself to persuasion has side effects:
  • It’s hostile to “listening to experts” (or to anyone).
  • It favors naive, simple causal attributions over inquiry into system structure.
  • It opposes learning from feedback, whether from constituents or objective measurements.
  • Confronted by adversity, it retreats into confirmation bias and threat rigidity.

So, overconfidence doesn’t make the economy grow faster. It just makes things go faster, whether paradise or a cliff lies ahead.

I don’t think firms or planets want to speed off a cliff, so we need to do better. It’s a tall order, but I think awareness of the problem is a good start. After that, it’s a hard road, but we all need to become better system scientists, and spend more time participating in governance and attempting to understand how systems work.

There’s reason for hope – not all firms fall into capability traps. Emulating those that succeed, we might start by investing some time in process improvement. The flawed processes by which we now pick leaders look like low-hanging fruit to me.

Another field ponders rationality

The reasoning criminal vs. Homer Simpson: conceptual challenges for crime science

A recent disciplinary offshoot of criminology, crime science (CS) defines itself as “the application of science to the control of crime.” One of its stated ambitions is to act as a cross-disciplinary linchpin in the domain of crime reduction. Despite many practical successes, notably in the area of situational crime prevention (SCP), CS has yet to achieve a commensurate level of academic visibility. The case is made that the growth of CS is stifled by its reliance on a model of decision-making, the Rational Choice Perspective (RCP), which is inimical to the integration of knowledge and insights from the behavioral, cognitive and neurosciences (CBNs).

What's your favorite cognitive bias?

Business Insider has a nifty compilation of cognitive biases, extracted from wikipedia’s huge list.

It would be cool to identify the ones that involve dynamics, and identify a small conceptual model illustrating each one.

In SD, we often call these misperceptions of feedback, though one might also include failures to mentally simulate accumulation, which doesn’t require feedback. Some samples that jump to mind:

Not only the tragedy of the commons: misperceptions of feedback and policies for sustainable development

Drunker than intended: Misperceptions and information treatments

Capability traps and self-confirming attribution errors in the dynamics of process improvement

Modeling managerial behavior: Misperceptions of feedback in a dynamic decision making experiment

Explaining capacity overshoot and price war: misperceptions of feedback in competitive growth markets

Bathtub dynamics: initial results of a systems thinking inventory

What’s your favorite foible?

Is happiness bimodal? Why?

At the Balaton meeting, I picked up a report on happiness in Japan, Measuring National Well-Being – Proposed Indicators. There’s a lot of interesting material in it, but a figure on page 14 stopped me in my tracks:

The distribution of happiness scores is rather strongly bimodal, and has been stable that way for 30+ years.

There might be an obvious explanation: heterogeneity – perhaps women are happy, and men aren’t, or the reverse, or maybe a lot of people just like to answer “5”. But the same thing appears in some European countries:

Denmark is obscenely happy (time to look for an apartment in Copenhagen) but several other countries display the same dual peaks as Japan. I wouldn’t expect the same cultural dynamics, so what’s going on?

A tantalizing possibility is that this is the product of a dynamic system. But if so, it’s 1D representation would have to look something like,

That’s really rather weird, so perhaps it’s just an artifact (after all, bimodality doesn’t appear everywhere). But since happiness is largely a social phenomenon, it’s certainly plausible that the intersection of several feedbacks yields this behavior.

I find it rather remarkable that no one has noted this – at least, google scholar fails me on the notion of bimodal happiness or subjective well being. A similar phenomenon appears in text analysis of twitter and other media.

Any theories?

Real estate appraisal – learning the wrong lesson from failure

I just had my house appraised for a refinance. The appraisal came in at least 20% below my worst-case expectation of market value. The basis of the judgment was comps, about which the best one could say is that they’re in the same county.

I could be wrong. But I think it more likely that the appraisal was rubbish. Why did this happen? I think it’s because real estate appraisal uses unscientific methods that would not pass muster in any decent journal, enabling selection bias and fudge factors to dominate any given appraisal.

When the real estate bubble was on the way up, the fudge factors all provided biased confirmation of unrealistically high prices. In the bust, appraisers got burned. They didn’t learn that their methods were flawed; rather they concluded that the fudge factors should point down, rather than up.

Here’s how appraisals work:

A lender commissions an appraisal. Often the appraiser knows the loan amount or prospective sale price (experimenters used to double-blind trials should be cringing in horror).

The appraiser eyeballs the subject property, and then looks for comparable sales of similar properties within a certain neighborhood in space and time (the “market window”). There are typically 4 to 6 of these, because that’s all that will fit on the standard appraisal form.

The appraiser then adjusts each comp for structure and lot size differences, condition, and other amenities. The scale of adjustments is based on nothing more than gut feeling. There are generally no adjustments for location or timing of sales, because that’s supposed to be handled by the neighborhood and market window criteria.

There’s enormous opportunity for bias, both in the selection of the comp sample and in the adjustments. By cherry-picking the comps and fiddling with adjustments, you can get almost any answer you want. There’s also substantial variance in the answer, but a single point estimate is all that’s ever reported.

Here’s how they should work:

The lender commissions an appraisal. The appraiser never knows the price or loan amount (though in practice this may be difficult to enforce).

The appraiser fires up a database that selects lots of comps from a wide neighborhood in time and space. Software automatically corrects for timing and location by computing spatial and temporal gradients. It also automatically computes adjustments for lot size, sq ft, bathrooms, etc. by hedonic regression against attributes coded in the database. It connects to utility and city information to determine operating costs – energy and taxes – to adjust for those.

The appraiser reviews the comps, but only to weed out obvious coding errors or properties that are obviously non-comparable for reasons that can’t be adjusted automatically, and visits the property to be sure it’s still there.

The answer that pops out has confidence bounds and other measures of statistical quality attached. As a reality check, the process is repeated for the rental market, to establish whether rent/price ratios indicate an asset bubble.

If those tests look OK, and the answer passes the sniff test, the appraiser reports a plausible range of values. Only if the process fails to converge does some additional judgment come into play.

There are several patents on such a process, but no widespread implementation. Most of the time, it would probably be cheaper to do things this way, because less appraiser time would be needed for ultimately futile judgment calls. Perhaps it would exceed the skillset of the existing population of appraisers though.

It’s bizarre that lenders don’t expect something better from the appraisal industry. They lose money from current practices on both ends of market cycles. In booms, they (later) suffer excess defaults. In busts, they unnecessarily forgo viable business.

To be fair, fully automatic mass appraisal like Zillow and Trulia doesn’t do very well in my area. I think that’s mostly lack of data access, because they seem to cover only a small subset of the market. Perhaps some human intervention is still needed, but that human intervention would be a lot more effective if it were informed by even the slightest whiff of statistical reasoning and leveraged with some data and computing power.

Update: on appeal, the appraiser raised our valuation 27.5%. Case closed.