Participants in my Vensim mini-course at the 2013 System Dynamics Conference outperformed their colleagues from 2012 on the Capen Quiz (mean of 5 right vs. 4 last year).
5 right is well above the typical performance of the public, but sadly this means that few among us are destined to be CEOs, who are often wildly overconfident (console yourself – abject failure on the quiz can make you a titan of industry).
Take the quiz and report back!
I took the quiz. One could force an 80% result by intentionally choosing wide bounds on eight questions and narrow, silly bounds on two questions, but I did my best to pick bounds that would be meaningful while still containing the right answer, and I happened to get 8/10 right. My answers were:
1. 100 ft – 1000 ft (correct)
2. 10k miles – 10m miles (correct)
3. 1950-1965 (correct)
4. 200-500 (correct)
5. 100-3,000 (correct)
6. 100k-20m (correct)
7. 10k-100m (correct)
8. 100-100k (wrong)
9. 1 in – 2.5 in (correct)
10. 1k ft – 10k ft (wrong)
It’s sort of an odd task because you must decide how to balance two competing values: how many questions you want to get right (encouraging you to select wide bounds) and how meaningful or bold you want your bounds to be (encouraging you to select small bounds). Without any guidance on how to balance these values, it seems sort of like a psychology test rather than a game or a test of estimation skill.
Perhaps one way to improve the test would be to factor the bounds you choose into your score. Tighter bounds give you more points, looser bounds fewer points, and a wrong answer zero points. For example, maybe you get 100 points for getting a correct answer with bounds that span less than an order of magnitude, 50 points for getting a correct answer with bounds that span two orders of magnitude, 10 for three orders of magnitude, etc. (The cutoffs for the different point values might need to be rescaled depending on the questions. The golf ball diameter question, for example, would be too easy to get right within a single order of magnitude.)
Nice points. The gaming approach (8 wide, 2 narrow) is definitely a problem, but doesn’t come up when we approach this with people as just a fun thinking exercise.
I like the idea of changing the scoring, though it’s not really compatible with doing this on paper, self-scored, in a room full of people. One option would be to give people a likelihood score. For example, assume that their range is an 80% interval for a lognormal distribution. Then give them a closeness score that’s the sum of the likelihoods of the answers where they lie with respect to their distribution. Then the incentives are balanced, and gaming wouldn’t work. However, people who are better informed (airplane buffs or golfers) would then have an advantage, which isn’t what we’re after. Hmmm…