From Daniel Kahneman “Thinking, fast and slow”
Even Statisticians Get Fooled by the Small Numbers Bias.
- The law of small numbers state:
“Large samples are more precise than small samples.”
“Small samples yield extreme results more often than large samples do.” (p. 111)
In other words, since small samples can yield extreme results, the results can’t be totally trusted. - The Law of Small Numbers fools most people. They let the stories and biases in their head affect the way they interpret data from small samples. But even sophisticated researchers and statisticians fall into the trap of interpreting data from small numbers.
Why do experts and people put faith in the results of small samples even though they know about its flaws?
- “The exaggerated faith in small samples is only one example of a more general illusion—we pay more attention to the content of messages than to information about their reliability, and as a result end up with a view of the world around us that is simpler and more coherent than the data justify. Jumping to conclusions is a safer sport in the world of our imagination than it is in reality.” (p. 118)
Examples of small sample biases
- 1. In a telephone poll of 300 seniors, 60% support the President.
What are your first thoughts about this?
You might come up with a causal explanation why most seniors support the president. The reality is 300 seniors is too small of a sampling size to determine if most seniors support the President. What if this was reported by NY Times, or reported over a water cooler? How would that affect your thinking?
- 2. A study shows kidney cancer is lowest in rural states. What are you first thoughts about this?
“The counties in which incidence of kidney cancer is lowest are mostly rural, sparsely populated, and located in traditional Republican states in the Midwest, the South, and the West.”
“It is both easy and tempting to infer that their low cancer rates are directly due to the clean living of the rural lifestyle—no air pollution, no water pollution, access to fresh food without additives.” This makes perfect sense.” (p. 109)
The study also found kidney cancer is highest in mostly rural, sparsely populated, and located in traditional Republican states in the Midwest, the South, and the West. How do you explain this?
“It is easy to infer that their high cancer rates might be directly due to the poverty of the rural lifestyle—no access to good medical care, a high-fat diet, and too much alcohol, too much tobacco.” (p. 109)
How can rural lifestyle have both the highest cancer rates and the lowest kidney cancer rates?! It has nothing to do with rural lifestyles. It’s simply because rural samples are small. Remember small sample sizes give extreme results.
- 3. “take the sex of six babies born in sequence at a hospital. The sequence of boys and girls is obviously random; the events are independent of each other, and the number of boys and girls who were born in the hospital in the last few hours has no effect whatsoever on the sex of the next baby.” (P. 115)
Now consider three possible sequences:
BBBGGG
GGGGGG
BGBBGB
Are these sequences equally likely?
Most people’s intuition think the third one is more likely because it looks more random. This is false. The are all almost equally likely.
Other Good Quotes:
“Statistics produce many observations that appear to beg for causal explanations but do not lend themselves to such explanations. Many facts of the world are due to chance, including accidents of sampling. Causal explanations of chance events are inevitably wrong.” (p. 118)
“The exaggerated faith in small samples is only one example of a more general illusion—we pay more attention to the content of messages than to information about their reliability, and as a result end up with a view of the world around us that is simpler and more coherent than the data justify. Jumping to conclusions is a safer sport in the world of our imagination than it is in reality.” (p. 118)