Real estate appraisal – learning the wrong lesson from failure

I just had my house appraised for a refinance. The appraisal came in at least 20% below my worst-case expectation of market value. The basis of the judgment was comps, about which the best one could say is that they’re in the same county.

I could be wrong. But I think it more likely that the appraisal was rubbish. Why did this happen? I think it’s because real estate appraisal uses unscientific methods that would not pass muster in any decent journal, enabling selection bias and fudge factors to dominate any given appraisal.

When the real estate bubble was on the way up, the fudge factors all provided biased confirmation of unrealistically high prices. In the bust, appraisers got burned. They didn’t learn that their methods were flawed; rather they concluded that the fudge factors should point down, rather than up.

Here’s how appraisals work:

A lender commissions an appraisal. Often the appraiser knows the loan amount or prospective sale price (experimenters used to double-blind trials should be cringing in horror).

The appraiser eyeballs the subject property, and then looks for comparable sales of similar properties within a certain neighborhood in space and time (the “market window”). There are typically 4 to 6 of these, because that’s all that will fit on the standard appraisal form.

The appraiser then adjusts each comp for structure and lot size differences, condition, and other amenities. The scale of adjustments is based on nothing more than gut feeling. There are generally no adjustments for location or timing of sales, because that’s supposed to be handled by the neighborhood and market window criteria.

There’s enormous opportunity for bias, both in the selection of the comp sample and in the adjustments. By cherry-picking the comps and fiddling with adjustments, you can get almost any answer you want. There’s also substantial variance in the answer, but a single point estimate is all that’s ever reported.

Here’s how they should work:

The lender commissions an appraisal. The appraiser never knows the price or loan amount (though in practice this may be difficult to enforce).

The appraiser fires up a database that selects lots of comps from a wide neighborhood in time and space. Software automatically corrects for timing and location by computing spatial and temporal gradients. It also automatically computes adjustments for lot size, sq ft, bathrooms, etc. by hedonic regression against attributes coded in the database. It connects to utility and city information to determine operating costs – energy and taxes – to adjust for those.

The appraiser reviews the comps, but only to weed out obvious coding errors or properties that are obviously non-comparable for reasons that can’t be adjusted automatically, and visits the property to be sure it’s still there.

The answer that pops out has confidence bounds and other measures of statistical quality attached. As a reality check, the process is repeated for the rental market, to establish whether rent/price ratios indicate an asset bubble.

If those tests look OK, and the answer passes the sniff test, the appraiser reports a plausible range of values. Only if the process fails to converge does some additional judgment come into play.

There are several patents on such a process, but no widespread implementation. Most of the time, it would probably be cheaper to do things this way, because less appraiser time would be needed for ultimately futile judgment calls. Perhaps it would exceed the skillset of the existing population of appraisers though.

It’s bizarre that lenders don’t expect something better from the appraisal industry. They lose money from current practices on both ends of market cycles. In booms, they (later) suffer excess defaults. In busts, they unnecessarily forgo viable business.

To be fair, fully automatic mass appraisal like Zillow and Trulia doesn’t do very well in my area. I think that’s mostly lack of data access, because they seem to cover only a small subset of the market. Perhaps some human intervention is still needed, but that human intervention would be a lot more effective if it were informed by even the slightest whiff of statistical reasoning and leveraged with some data and computing power.

Update: on appeal, the appraiser raised our valuation 27.5%. Case closed.

Alternate perceptions of time

An interesting tidbit from Science:

Where Time Goes Up and Down

Dennis Normile

In Western cultures, the future lies ahead; the past is behind us. These notions are embedded in both gestures and spoken metaphors (looking forward to next year or back over the past year). A forward hand motion typically accompanies talk of the future; references to the past often bring a wave over the shoulder.

It is hard for most Westerners to conceive of other ways of conceptualizing time. But in 2006, Rafael Núñez, a cognitive scientist at the University of California, San Diego, reported that for the Aymara, an ethnic group of about 2 million people living in the Andean highlands, in both spoken and gestural terms, the future is unseen and conceived as being behind the speaker; the past, since it has been witnessed, is in front. They point behind themselves when discussing the future. And when talking about the past, Aymara gesture farther in front of them the more distant the event ….

At the Tokyo Evolutionary Linguistics Forum, Núñez presented another example of unusual thinking—and gesturing—about time: The Yupno people, who inhabit a remote valley in Papua New Guinea, think of time topographically. No matter which way a speaker is facing, he or she will gesture uphill when discussing the future and point downhill when talking about the past. …

I like the Aymara approach, with the future unseen behind the speaker. I bet there aren’t any Aymara economic models assuming perfect foresight as a model of behavior.

Where Time Goes Up and Down

Dennis Normile

Bathtub Statistics

The pitfalls of pattern matching don’t just apply to intuitive comparisons of the behavior of associated stocks and flows. They also apply to statistics. This means, for example, that a linear regression like

stock = a + b*flow + c*time + error

is likely to go seriously wrong. That doesn’t stop such things from sneaking into the peer reviewed literature though. A more common quasi-statistical error is to take two things that might be related, measure their linear trends, and declare the relationship falsified if the trends don’t match. This bogus reasoning remains a popular pastime of climate skeptics, who ask, how could temperature go down during some period when emissions went up? (See this example.) This kind of naive naive statistical reasoning, with static mental models of dynamic phenomena, is hardly limited to climate skeptics though.

Given the dynamics, it’s actually quite easy to see how such things can occur. Here’s a more complete example of a realistic situation:

At the core, we have the same flow driving a stock. The flow is determined by a variety of test inputs , so we’re still not worrying about circular causality between the stock and flow. There is potentially feedback from the stock to an outflow, though this is not active by default. The stock is also subject to other random influences, with a standard deviation given by Driving Noise SD. We can’t necessarily observe the stock and flow directly; our observations are subject to measurement error. For purposes that will become evident momentarily, we might perform some simple manipulations of our measurements, like lagging and differencing. We can also measure trends of the stock and flow. Note that this still simplifies reality a bit, in that the flow measurement is instantaneous, rather than requiring its own integration process as physics demands. There are no complications like missing data or unequal measurement intervals.

Now for an experiment. First, suppose that the flow is random (pink noise) and there are no measurement errors, driving noise, or outflows. In that case, you see this:

Once could actually draw some superstitious conclusions about the stock and flow time series above by breaking them into apparent episodes, but that’s quite likely to mislead unless you’re thinking explicitly about the bathtub. Looking at a stock-flow scatter plot, it appears that there is no relationship:

Of course, we know this is wrong because we built the model with perfect Flow->Stock causality. The usual statistical trick to reveal the relationship is to undo the integration by taking the first difference of the stock data. When you do that, plotting the change in the stock vs. the flow (lagged one period to account for the differencing), the relationship reappears: Continue reading “Bathtub Statistics”

Bathtub Dynamics

Failure to account for bathtub dynamics is a basic misperception of system structure, that occurs even in simple systems that lack feedback. Research shows that pattern matching, a common heuristic, leads even highly educated people to draw incorrect conclusions about systems as simple as the entry and exit of people in a store.

This can occur in any stock-flow system, which means that it’s ubiquitous. Here’s the basic setup:

Replace “Flow” and “Stock” with your favorite concepts – income and bank balance, sales rate and installed base, births and rabbits, etc. Obviously the flow causes the stock – by definition, the flow rate is the rate of change of the stock level. There is no feedback here; just pure integration, i.e. the stock accumulates the flow.

The pattern matching heuristic attempts to detect causality, or make predictions about the future, by matching the temporal patterns of cause and effect. So, naively, a pattern matcher expects to see a step in the stock in response to a step in the flow. But that’s not what happens:

Pattern matching fails because we shouldn’t expect the patterns to match through an integration. Above, the integral of the step ( flow = constant ) is a ramp ( stock = constant * time ). Other patterns are possible. For example, a monotonically decreasing cause (flow) can yield an increasing effect (stock), or even nonmonotonic behavior if it crosses zero: Continue reading “Bathtub Dynamics”

What a real breakthrough might look like

It’s possible that a techno fix will stave off global limits indefinitely, in a Star Trek future scenario. I think it’s a bad idea to rely on it, because there’s no backup plan.

But it’s equally naive to think that we can return to some kind of low-tech golden age. There are too many people to feed and house, and those bygone eras look pretty ugly when you peer under the mask.

But this is a false dichotomy.

Some techno/growth enthusiasts talk about sustainability as if it consisted entirely of atavistic agrarian aspirations. But what a lot of sustainability advocates are after, myself included, is a high-tech future that operates within certain material limits (planetary boundaries, if you will) before those limits enforce themselves in nastier ways. That’s not really too hard to imagine; we already have a high tech economy that operates within limits like the laws of motion and gravity. Gravity takes care of itself, because it’s instantaneous. Stock pollutants and resources don’t, because consequences are remote in time and space from actions; hence the need for coordination. Continue reading “What a real breakthrough might look like”

Your gut may be leading you astray

An interesting comment on rationality and conservatism:

I think Sarah Palin is indeed a Rorschach test for Conservatives, but it’s about much than manners or players vs. kibbitzes – it’s about what Conservativsm MEANS.

The core idea behind Conservatism is that most of human learning is done not by rational theorizing, but by pattern recognition. Our brain processes huge amounts of data every second, and most information we get out of it is in the form of recognized patterns, not fully logical theories. It’s fair to say that 90% of our knowledge is in patterns, not in theories.

This pattern recognition is called common sense, and over generations, it’s called traditions, conventions etc. Religion is usually a carrier meme for these evolved patterns. It’s sort of an evolutionary process, like a genetic algorithm.

Liberals, Lefties and even many Libertarians want to use only 10% of the human knowledge that’s rational. And because our rational knowledge cannot yet fully explain neither human nature in itself nor everything that happens in society, they fill the holes with myths like that everybody is born good and only society makes people bad etc.

Conservatives are practical people who instinctively recognize the importance of evolved patterns in human learning: because our rational knowledge simply isn’t enough yet, these common sense patterns are our second best option to use. And to use these patterns effectively you don’t particularly have to be very smart i.e. very rational. You have to be _wise_ and you have to have a good character: you have to set hubris and pride aside and be able to accept traditions you don’t fully understand.

Thus, for a Conservative, while smartness never hurts, being wise and having a good character is more important than being very smart. Looking a bit simple simply isn’t a problem, you still have that 90% of knowledge at hand.

Anti-Palin Conservatives don’t understand it. They think Conservativism is about having different theories than the Left, they don’t understand that it’s that theories and rational knowledge isn’t so important.

(via Rabbett Run)

A possible example of the writer’s perspective at work is provided by survey research showing that Tea Partiers are skeptical of anthropogenic climate change (established by models) but receptive to natural variation (vaguely, patterns), and they’re confident that they’re well-informed about it in spite of evidence to the contrary. Another possible data point is conservapedia’s resistance to relativity, which is essentially a model that contradicts our Newtonian common sense.

As an empirical observation, this definition of conservatism seems plausible at first. Humans are fabulous pattern recognizers. And, there are some notable shortcomings to rational theorizing. However, as a normative statement – that conservatism is better because of the 90%/10% ratio, I think it’s seriously flawed.

The quality of the 90% is quite different from the quality of the 10%. Theories are the accumulation of a lot of patterns put into a formal framework that has been shared and tested, which at least makes it easy to identify the theories that fall short. Common sense, or wisdom or whatever you want to call it, is much more problematic. Everyone knows the world is flat, right?

Sadly, there’s abundant evidence that our evolved heuristics fall short in complex systems. Pattern matching in particular falls short even in simple bathtub systems. Inappropriate mental models and heuristics can lead to decisions that are exactly the opposite of good management, even when property rights are complete; noise only makes things worse.

Real common sense would have the brains to abdicate when faced with situations, like relativity or climate change, where it was clear that experience (low velocities, local weather) doesn’t provide any patterns that are relevant to the conditions under consideration.

After some reflection, I think there’s more than pattern recognition to conservatism. Liberals, anarchists, etc. are also pattern matchers. We all have our own stylized facts and conventional wisdom, all of which are subject to the same sorts of cognitive biases. So, pattern matching doesn’t automatically lead to conservatism. Many conservatives don’t believe in global warming because they don’t trust models, yet observed warming and successful predictions of models from the 70s (i.e. patterns) also don’t count. So, conservatives don’t automatically respond to patterns either.

In any case, running the world by pattern recognition alone is essentially driving by looking in the rearview mirror. If you want to do better, i.e. to make good decisions at turning points or novel conditions, you need a model.

Bananas, vomit and behavioral economics

I just ran across a nice series of videos and transcripts on behavioral decision making, heuristics and biases, psychology and economics, with Nobel Prize winner Daniel Kahneman, Dick Thaler and other masters:

You have to watch the first to work out the meaning of my strange title. I can’t embed, so head over to Edge to view, where other interesting links will pop up.

More power of personal feedback

Now that I’ve dumped on emerging behavioral feedback technologies, perhaps I should share a personal success story, in which measurement technology played a key role.

Ten years ago, a routine test revealed that my cholesterol was 280 mg/dl, and even higher in a confirmation test. That’s not instant death, but it’s bad. NIH calls <200 desirable, and many argue for even lower levels.

This was a surprise, because I was getting a fair amount of exercise and eating healthier than the typical American diet. I suspect that their must be some genetic component.

Without any discussion, my doctor handed me a prescription for Lipitor. Now, I liked that doctor, and I know he was smart because we’d just had an interesting conversation about wavelet analysis of time series data in biomedical research. But I think he was operating under the assumption that there was no potential for improvement from behavior change. This idea seems to grip much of the medical profession, and creates nasty self-fulfilling prophecy and eroding goals dynamics.

I decided that I didn’t want to take statins for the rest of my (hopefully long) life, so with the aid of spousal prodding and planning, I eliminated all cholesterol and saturated fats (essentially all animal products) from my diet. I was quickly below 200, and then made more gradual progress to a range of about 160 to 180.

Interestingly, since then I’ve also cut out a lot of carbohydrates, because the rest of my family is gluten intolerant, which takes the fun out of bread and pasta. My cholesterol is now lower than ever, 149 at last check, in spite of adding eggs, a big dietary cholesterol source, back into my diet.

While my wife deserves most of the credit for my success, I think technology played a key role as well. Early on, I bought a home cholesterol test meter (a Bioscanner 2000, predecessor to the CardioChek that I now have). The meter allowed me to close the loop between behavior and outcome without the long delay and expense involved with a trip to the doctor. That obviously had a practical benefit, but it was also very motivating.

Continue reading “More power of personal feedback”

Drunker than intended and overinvested

Erling Moxnes on the dangers of forecasting without structural insight and the generic structure behind getting too drunk and underestimating delays when investing in a market, with the common outcome of instability.

More on drinking dynamics here, implemented as a game on Forio (haven’t tried it yet – curious about your experience if you do).