Real estate appraisal – learning the wrong lesson from failure

I just had my house appraised for a refinance. The appraisal came in at least 20% below my worst-case expectation of market value. The basis of the judgment was comps, about which the best one could say is that they’re in the same county.

I could be wrong. But I think it more likely that the appraisal was rubbish. Why did this happen? I think it’s because real estate appraisal uses unscientific methods that would not pass muster in any decent journal, enabling selection bias and fudge factors to dominate any given appraisal.

When the real estate bubble was on the way up, the fudge factors all provided biased confirmation of unrealistically high prices. In the bust, appraisers got burned. They didn’t learn that their methods were flawed; rather they concluded that the fudge factors should point down, rather than up.

Here’s how appraisals work:

A lender commissions an appraisal. Often the appraiser knows the loan amount or prospective sale price (experimenters used to double-blind trials should be cringing in horror).

The appraiser eyeballs the subject property, and then looks for comparable sales of similar properties within a certain neighborhood in space and time (the “market window”). There are typically 4 to 6 of these, because that’s all that will fit on the standard appraisal form.

The appraiser then adjusts each comp for structure and lot size differences, condition, and other amenities. The scale of adjustments is based on nothing more than gut feeling. There are generally no adjustments for location or timing of sales, because that’s supposed to be handled by the neighborhood and market window criteria.

There’s enormous opportunity for bias, both in the selection of the comp sample and in the adjustments. By cherry-picking the comps and fiddling with adjustments, you can get almost any answer you want. There’s also substantial variance in the answer, but a single point estimate is all that’s ever reported.

Here’s how they should work:

The lender commissions an appraisal. The appraiser never knows the price or loan amount (though in practice this may be difficult to enforce).

The appraiser fires up a database that selects lots of comps from a wide neighborhood in time and space. Software automatically corrects for timing and location by computing spatial and temporal gradients. It also automatically computes adjustments for lot size, sq ft, bathrooms, etc. by hedonic regression against attributes coded in the database. It connects to utility and city information to determine operating costs – energy and taxes – to adjust for those.

The appraiser reviews the comps, but only to weed out obvious coding errors or properties that are obviously non-comparable for reasons that can’t be adjusted automatically, and visits the property to be sure it’s still there.

The answer that pops out has confidence bounds and other measures of statistical quality attached. As a reality check, the process is repeated for the rental market, to establish whether rent/price ratios indicate an asset bubble.

If those tests look OK, and the answer passes the sniff test, the appraiser reports a plausible range of values. Only if the process fails to converge does some additional judgment come into play.

There are several patents on such a process, but no widespread implementation. Most of the time, it would probably be cheaper to do things this way, because less appraiser time would be needed for ultimately futile judgment calls. Perhaps it would exceed the skillset of the existing population of appraisers though.

It’s bizarre that lenders don’t expect something better from the appraisal industry. They lose money from current practices on both ends of market cycles. In booms, they (later) suffer excess defaults. In busts, they unnecessarily forgo viable business.

To be fair, fully automatic mass appraisal like Zillow and Trulia doesn’t do very well in my area. I think that’s mostly lack of data access, because they seem to cover only a small subset of the market. Perhaps some human intervention is still needed, but that human intervention would be a lot more effective if it were informed by even the slightest whiff of statistical reasoning and leveraged with some data and computing power.

Update: on appeal, the appraiser raised our valuation 27.5%. Case closed.

Positive feedback drives email list meltdown

I’m on an obscure email list for a statistical downscaling model. I think I’ve gotten about 10 messages in the last two years. But today, that changed.

List traffic (data in red).

Around 7 am, there were a couple of innocuous, topical messages. That prompted someone who’d evidently long forgotten about the list to send an “unsubscribe me” message to the whole list. (Why people can’t figure out that such missives are both ineffective and poor list etiquette is beyond me.) That unleashed a latent vicious cycle: monkey-see, monkey-do produced a few more “unsub” messages. Soon the traffic level became obnoxious, spawning more and more ineffectual unsubs. Then, the brakes kicked in, as more sensible users appealed to people to quit replying to the whole list. Those messages were largely lost in the sea of useless unsubs, and contributed to the overall impression that things were out of control.

People got testy:

I will reply to all to make my point.

Has it occurred to any of you idiots to just reply to Xxxx Xxxx rather than hitting reply to all. Come on already, this is not rocket science here. One person made the mistake and then you all continue to repeat it.

By about 11, the fire was slowing, evidently having run out of fuel (list ignoramuses), and someone probably shut it down by noon – but not before at least a hundred unsubs had flown by.

Just for kicks, I counted the messages and put together a rough-cut Vensim model of this little boom-bust cycle:

unsub.mdl unsub.vpm

This is essentially the same structure as the Bass Diffusion model, with a few refinements. I think I didn’t quite capture the unsubscriber behavior. Here, I assume that would-be unsubscribers, who think they’ve left the list but haven’t, at least quit sending messages. In reality, they didn’t – in blissful ignorance of what was going on, several sent multiple requests to be unsubscribed. I didn’t explicitly represent the braking effect (if any) of corrective comments. Also, the time constants for corrections and unsubscriptions could probably be separated. But it has the basics – a positive feedback loop driving growth in messages, and a negative feedback loop putting an end to the growth. Anyway, have fun with it.

Computing and networks have solved a lot of problems, like making logistics pipelines visible, but they’ve created as many new ones. The need for models to improve intuition and manage new problems is as great as ever.

Climate incentives

Richard Lindzen and many others have long maintained that climate science promotes alarm in order to secure funding. For example:

Regarding Professor Nordhaus’s fifth point that there is no evidence that money is at issue, we simply note that funding for climate science has expanded by a factor of 15 since the early 1990s, and that most of this funding would disappear with the absence of alarm. Climate alarmism has expanded into a hundred-billion-dollar industry far broader than just research. Economists are usually sensitive to the incentive structure, so it is curious that the overwhelming incentives to promote climate alarm are not a consideration to Professor Nordhaus. There are no remotely comparable incentives to the contrary position provided by the industries that he claims would be harmed by the policies he advocates.

I’ve always found this idea completely absurd, but to prep for an upcoming talk I decided to collect some rough numbers. A picture says it all:

Data

Notice that it’s completely impractical to make the scale large enough to see any detail in climate science funding or NGOs. I didn’t even bother to include the climate-specific NGOs, like 350.org and USCAN, because they are too tiny to show up (under $10m/yr). Yet, if anything, my tally of the climate-related activity is inflated. For example, a big slice of US Global Change Research is remote sensing (56% of the budget is NASA), which is not strictly climate-related. The cleantech sector is highly fragmented and diverse, and driven by many incentives other than climate. Over 2/3 of the NGO revenue stream consists of Ducks Unlimited and the Nature Conservancy, which are not primarily climate advocates.

Nordhaus, hardly a tree hugger himself, sensibly responds,

As a fifth point, they defend their argument that standard climate science is corrupted by the need to exaggerate warming to obtain research funds. They elaborate this argument by stating, “There are no remotely comparable incentives to the contrary position provided by the industries that he claims would be harmed by the policies he advocates.”

This is a ludicrous comparison. To get some facts on the ground, I will compare two specific cases: that of my university and that of Dr. Cohen’s former employer, ExxonMobil. Federal climate-related research grants to Yale University, for which I work, averaged $1.4 million per year over the last decade. This represents 0.5 percent of last year’s total revenues.

By contrast, the sales of ExxonMobil, for which Dr. Cohen worked as manager of strategic planning and programs, were $467 billion last year. ExxonMobil produces and sells primarily fossil fuels, which lead to large quantities of CO2 emissions. A substantial charge for emitting CO2 would raise the prices and reduce the sales of its oil, gas, and coal products. ExxonMobil has, according to several reports, pursued its economic self-interest by working to undermine mainstream climate science. A report of the Union of Concerned Scientists stated that ExxonMobil “has funneled about $16 million between 1998 and 2005 to a network of ideological and advocacy organizations that manufacture uncertainty” on global warming. So ExxonMobil has spent more covertly undermining climate-change science than all of Yale University’s federal climate-related grants in this area.

Money isn’t the whole story. Science is self-correcting, at least if you believe in empiricism and some kind of shared underlying physical reality. If funding pressures could somehow overcome the gigantic asymmetry of resources to favor alarmism, the opportunity for a researcher to have a Galileo moment would grow as the mainstream accumulated unsolved puzzles. Sooner or later, better theories would become irresistible. But that has not been the history of climate science; alternative hypotheses have been more risible than irresistible.

Given the scale of the numbers, each of the big 3 oil companies could run a climate science program as big as the US government’s, for 1% of revenues. Surely the NPV of their potential costs, if faced with a real climate policy, would justify that. But they don’t. Why? Perhaps they know that they wouldn’t get a different answer, or that it’s far cheaper to hire shills to make stuff up than to do real science?

A Geoff Coyle reading list

The System Dynamics Society reports that SD pioneer Geoff Coyle has passed away.

We report the sad news that longtime system dynamicist R. Geoffrey Coyle died on November 19, 2012. Geoff was 74. He started his career as a mining engineer. Having completed a PhD in Operations Research, he came to Cambridge, Massachusetts from the UK in the late 1960’s, and studied with Jay Forrester to learn system dynamics. Upon his return to the UK, he started to develop system dynamics in England. He was the founder of the first system dynamics group in the UK, at the University of Bradford in 1970. This group grew terrifically and produced some of the most important people in our field. Geoff and his students have made enormously important contributions to the field and the next generation of their students have as well, all following in Geoff’s footsteps and under his tutelage.

Geoff and the Bradford group also founded the first system dynamics journal, Dynamica. They created DYSMAP, the first system dynamics software that had built-in optimization and built-in dimensional consistency technique.

Geoff authored a number of very important books in the field including: Management in System Dynamics (1977), System Dynamics Modelling: A Practical Approach (1996) and Practical Strategy: Tools and Techniques (2004). In 1998, he was the first recipient of the Lifetime Achievement Award of the System Dynamics Society. More recently he returned to his first academic love and wrote a highly acclaimed history of mining in the UK: The riches beneath our feet (2010). This is a wonderful legacy in the field of system dynamics and beyond.

I realized that, while I’ve always enjoyed his irascibly interesting presentations, I’ve only read a few of his works. So, I’ve collected a Coyle reading list: Continue reading “A Geoff Coyle reading list”

Not even wrong: a school board’s discussion of systems thinking

Socialism. Communism. “Nazism.” American Exceptionalism. Indoctrination. Buddhism. Meditation. “Americanism.” These are not words or terms one would typically expect to hear in a Winston-Salem/Forsyth County School Board meeting. But in the Board’s last meeting on October 9th, they peppered the statements of public commenters and Board Members alike.

The object of this invective? Systems thinking. You really have to read part 1 and part 2 of Camel City Dispatch’s article to get an appreciation for the school board’s discussion of the matter.

I know that, as a systems thinker, I should look for the unstated assumptions that led board members to their critiques, and establish a constructive dialog. But I just can’t do it – I have to call out the fools. While there are some voices of reason, several of the board members and commenters apparently have no understanding of the terms they bandy about, and have no business being involved in the education of anyone, particularly children.

The low point of the exchange:

Jeannie Metcalf said she “will never support anything that has to do with Peter Senge… I don’t care what [the teachers currently trained in System’s Thinking] are teaching. I don’t care what lessons they are doing. He’s is trying to sell a product. Once it insidiously makes its way into our school system, who knows what he’s going to do. Who knows what he’s going to do to carry out his Buddhist way of thinking and his hatred of Capitalism. I know y’all are gonna be thinkin’ I’m a crazy person, but I’ve been around a long time.”

Yep, you’re crazy all right. In your imaginary parallel universe, “hatred of capitalism” must be a synonym for writing one of the most acclaimed business books ever, sitting at one of the best business schools in the world, and consulting at the highest levels of many Fortune 50 companies.

The common thread among the ST critics appears to be a total failure to actually observe classrooms combined with shoot-the-messenger reasoning from consequences. They see, or imagine, a conclusion that they don’t like, something that appears vaguely environmental or socialist, and assume that it must be part of the hidden agenda of the curriculum. In fact, as supporters pointed out, ST is a method, which could as easily be applied to illustrate the benefits of individualism, markets, or whatnot, as long as they are logically consistent. Of course, if one’s pet virtue has limits or nuances, ST may also reveal those – particularly when simulation is used to formalize arguments. That is what the critics are really afraid of.

A small victory for scientific gobbledygook, arithmetic and Nate Silver

Nate Silver of 538 deserves praise for calling the election in all 50 states, using a fairly simple statistical model and lots of due diligence on the polling data. When the dust settles, I’ll be interested to see a more detailed objective evaluation of the forecast (e.g., some measure of skill, like likelihoods).

Many have noted that his approach stands in stark contrast to big-ego punditry:

Another impressive model-based forecasting performance occurred just days before the election, with successful prediction of Hurricane Sandy’s turn to landfall on the East Coast, almost a week in advance.

On October 22, you blogged that there was a possibility it could hit the East Coast. How did you know that?

There are a few rather reliable global models. They’re models that run all the time, all year long, so they don’t focus on any one storm. They run for the entire globe, not just for North America. There are two types of runs these models can be configured to do. One is called a deterministic run and that’s where you get one forecast scenario. Then the other mode, and I think this is much more useful, especially at longer ranges where things become much more uncertain, is ensemble—where 20 or 40 or 50 runs can be done. They are not run at as high of a resolution as the deterministic run, otherwise it would take forever, but it’s still incredibly helpful to look at 20 runs.

Because you have variation? Do the ensemble runs include different winds, currents, and temperatures?

You can tweak all sorts of things to initialize the various ensemble members: the initial conditions, the inner-workings of the model itself, etc. The idea is to account for observational error, model error, and other sources of uncertainty. So you come up with 20-plus different ways to initialize the model and then let it run out in time. And then, given the very realistic spread of options, 15 of those ensemble members all recurve the storm back to the west when it reaches the East coast, and only five of them take it northeast. That certainly has some information content. And then, one run after the next, you can watch those. If all of the ensemble members start taking the same track, it doesn’t necessarily make them right, but it does mean it’s more likely to be right. You have much more confidence forecasting a track if the model guidance is in in good agreement. If it’s a 50/50 split, that’s a tough call.

– Outside

On October 22, you blogged that there was a possibility it could hit the East Coast. How did you know that?
There are a few rather reliable global models. They’re models that run all the time, all year long, so they don’t focus on any one storm. They run for the entire globe, not just for North America. There are two types of runs these models can be configured to do. One is called a deterministic run and that’s where you get one forecast scenario. Then the other mode, and I think this is much more useful, especially at longer ranges where things become much more uncertain, is ensemble—where 20 or 40 or 50 runs can be done. They are not run at as high of a resolution as the deterministic run, otherwise it would take forever, but it’s still incredibly helpful to look at 20 runs.

Because you have variation? Do the ensemble runs include different winds, currents, and temperatures?
You can tweak all sorts of things to initialize the various ensemble members: the initial conditions, the inner-workings of the model itself, etc. The idea is to account for observational error, model error, and other sources of uncertainty. So you come up with 20-plus different ways to initialize the model and then let it run out in time. And then, given the very realistic spread of options, 15 of those ensemble members all recurve the storm back to the west when it reaches the East coast, and only five of them take it northeast. That certainly has some information content. And then, one run after the next, you can watch those. If all of the ensemble members start taking the same track, it doesn’t necessarily make them right, but it does mean it’s more likely to be right. You have much more confidence forecasting a track if the model guidance is in in good agreement. If it’s a 50/50 split, that’s a tough call.

On October 22, you blogged that there was a possibility it could hit the East Coast. How did you know that?

There are a few rather reliable global models. They’re models that run all the time, all year long, so they don’t focus on any one storm. They run for the entire globe, not just for North America. There are two types of runs these models can be configured to do. One is called a deterministic run and that’s where you get one forecast scenario. Then the other mode, and I think this is much more useful, especially at longer ranges where things become much more uncertain, is ensemble—where 20 or 40 or 50 runs can be done. They are not run at as high of a resolution as the deterministic run, otherwise it would take forever, but it’s still incredibly helpful to look at 20 runs.

 

Because you have variation? Do the ensemble runs include different winds, currents, and temperatures?

You can tweak all sorts of things to initialize the various ensemble members: the initial conditions, the inner-workings of the model itself, etc. The idea is to account for observational error, model error, and other sources of uncertainty. So you come up with 20-plus different ways to initialize the model and then let it run out in time. And then, given the very realistic spread of options, 15 of those ensemble members all recurve the storm back to the west when it reaches the East coast, and only five of them take it northeast. That certainly has some information content. And then, one run after the next, you can watch those. If all of the ensemble members start taking the same track, it doesn’t necessarily make them right, but it does mean it’s more likely to be right. You have much more confidence forecasting a track if the model guidance is in in good agreement. If it’s a 50/50 split, that’s a tough call.