I apologize for the monotony on here this week. This stuff has hijacked my curiosity.
With the eight random numeric survey questions I asked the other day (results here), I was curious to see how well the 'wisdom of the crowd' -- a phrase I dislike for describing the phenomenon of errors cancelling each other out -- would work when there was no self-selection, no incentives, and discouragement of research. Especially given the small sample sizes between 35 and 39, I was not expecting the crowd -- that is, the average of the responses -- to be very "wise", and I was right...for two of the eight questions. I will get to those in a minute, but first the questions where the crowd did extremely well or reasonably well:
This despite the individual guesses often being wildly off...
How much does Roger Federer weigh, in pounds? Crowd: 177.8, Truth: 177
How fast can Usain Bolt run, in miles per hour? Crowd: 25.8, Truth: 27.4
How fast can an elephant run, in miles per hour? Crowd: 21.2, Truth: 22 to 25
The accuracy on Federer is a little spooky, but most impressive, I thought, was the elephant one. I picked the question expecting people to be biased, thinking of elephants as slow, lumbering animals, but not so! (Interestingly, 20 people guessed Usain is faster than an elephant, 15 people guessed the reverse, and 4 thought they were the same.)
And the crowd did reasonably well in answering three others:
How many hours of road travel to go from NYC to LA? Crowd: 48, Truth: 44
How much does a barrel of oil weigh, in pounds? Crowd: 300, Truth: ~250 (depending on type of oil)
How many words per minute can the average adult read? Crowd: 145, Truth: 250 to 300
A handful of people vastly overestimated the travel hours between NYC and LA. I am wondering if it is because they were thinking in miles instead of hours. After removing those outliers the crowd did pretty well.
I am guessing the crowd wasn't closer on words per minute because people were biased downward by the reference point of typing speed.
And finally, the two questions where the crowd did badly:
How much does the Golden Gate Bridge weigh, in tons? Crowd: 4,000 median, 193,000 mean; Truth: 887,000
How many gallons of oil are spilt in the Gulf of Mexico each minute? Crowd: 5,000 median, 36,000 mean; Truth: 2,042 assuming 70,000 barrels/day, 146 assuming 5,000 barrels/day
In these questions about half of the individual responses beat the median. That's as bad as it gets. (By definition of the median, no more than half of the responses will do better than the crowd.)
I suspect two different factors were at play. In the Golden Gate Bridge question, it was a problem of large numbers: our minds simply don't have a good reference base for numbers like 887,000. We can think of a car as about a ton, but asking 'how many cars worth of weight' for something the size of the Golden Gate Bridge is too much, and people in this example tended to vastly underestimate the true weight.
In the oil spill example, I think it is bias resulting from the images we've seen of a fast spewing cloud, possibly combined with the large estimates we've been hearing like 70,000 barrels/day. We have no easy way of knowing how large that spewing cloud is because we are without a reference point. I suspect that if the spewing cloud had a person next to it, the crowd would have been much more accurate. Or if I asked about how much water comes out of my faucet per minute, since that's something we can easily picture, I'd bet the crowd would be very accurate.
So what are the lessons?
I would summarize it in a long-ish statement: The crowd can be very "wise" on numeric guessing games even without a large number of responses and without high individual accuracy and without any incentives or self-selection as long as two (not unrelated) conditions are met: (1) There are no severe biases, and (2) the question is referentially in their "strike-zone", meaning they have some easy and reliable reference from which to make comparisons (e.g., Usain Bolt vs. driving a car).
Some research suggests that collective cognition works in the following way:
Collective Error = Average Individual Error - Prediction Diversity
The implication is that you can reduce collective error by increasing guessing ability or by increasing the diversity of responses/respondents. This also, by the way, assumes that incentives are in place.
I have a lot left to learn but even after my first simple experiment that model doesn't seem right to me. The individual error on the questions was very high, and the diversity of respondents was probably low because the people who answered it had self-selected into reading my idiosyncratic blog. This is all to say that you would expect the collective error, by this model, to be very high, and it wasn't at all.
---
Ideas for what to test next
I hope you all are game for some more because I feel like I am just getting started. I am looking for more factors to test. Below are some of my initial thoughts, ordered roughly by interestingness:
Complex ideas, e.g. how much faster is Usain Bolt than an elephant?
Data/Research - whether I give people data and/or encourage them to research
Independence, i.e. whether people can see each other's answers
Diversity - asking same questions to different audiences
Familiarity, i.e. how familiar people are with the subject (e.g. the weight of Roger Federer vs. the weight of my elementary school gym coach)
Anchoring - induce bias by making people think of a number to start with
Incentives - peer-based incentives vs. monetary, and closest-answer-wins vs. anyone-who-beats-crowd-wins
Self-selection, i.e. encourage people to opt out if they don't think they have a good guess
Probabilities
Predictions
Large numbers
And please let me know if you have other ideas.
Bankruptcy tourism
1 hour ago