From “Noise” by Daniel Kahneman
Appendix C- Correcting Predictions (p. 360)
“We make matching predictions when we rely on the information we have to make a forecast and behave as if this information were perfectly (or very highly) predictive of the outcome.”
“Recall the example of Julie, who could “read fluently when she was four years old.” The question was, what is her GPA?”
“If you predicted 3.8 for Julie’s college GPA, you intuitively judged that the four-year-old Julie was in the top 10% of her age group by reading age (although not in the top 3–5%). You then, implicitly, assumed that Julie would also rank somewhere around the 90th percentile of her class in terms of GPA. This corresponds to a GPA of 3.7 or 3.8—hence the popularity of these answers. What makes this reasoning statistically incorrect is that it grossly overstates the diagnostic value of the information available about Julie.“
“More often than not, in fact, outstanding performance will become less outstanding. Conversely, very poor performance will improve. It is easy to imagine social, psychological, or even political reasons for this observation, but reasons are not required. The phenomenon is purely statistical. Extreme observations in one direction or the other will tend to become less extreme, simply because past performance is not perfectly correlated with future performance. This tendency is called regression to the mean (hence the technical term nonregressive for matching predictions, which fail to take it into account). To put it quantitatively, the judgment you made about Julie would be correct if reading age were a perfect predictor of GPA, that is, if there were a correlation of 1 between the two factors. That is obviously not the case.“
1. Make your intuitive guess.
“Your intuition about Julie, or about any case about which you have information, is not worthless.”
“This guess is the prediction you would make if the information you have were perfectly predictive. Write it down.”
2. Look for the Mean
“Now, step back and forget what you know about Julie for a moment. What would you say about Julie’s GPA if you knew absolutely nothing about her? The answer, of course, is straightforward: in the absence of any information, your best guess of Julie’s GPA would have to be the mean GPA in her graduating class—probably somewhere around 3.2. Looking at Julie this way is an application of the broader principle we have discussed above, the outside view. When we take the outside view, we think of the case we are considering as an instance of a class, and we think about that class in statistical terms. Recall, for instance, how taking the outside view about the Gambardi problem leads us to ask what the base rate of success is for a new CEO (see chapter 4).“
3. Estimate the diagnostic value of the information you have.
“This is the difficult step, where you need to ask yourself, “What is the predictive value of the information I have?” The reason this question matters should be clear by now. If all you knew about Julie was her shoe size, you would correctly give this information zero weight and stick to the mean GPA prediction. If, on the other hand, you had the list of grades Julie has obtained in every subject, this information would be perfectly predictive of her GPA (which is their average). There are many shades of gray between these two extremes. If you had data about Julie’s exceptional intellectual achievements in high school, this information would be much more diagnostic than her reading age, but less than her college grades. Your task here is to quantify the diagnostic value of the data you have, expressed as a correlation with the outcome you are predicting.”
“Except in rare cases, this number will have to be a back-of-the-envelope estimate. To make a sensible estimate, remember some of the examples we listed in chapter 12. In the social sciences, correlations of more than .50 are very rare. Many correlations that we recognize as meaningful are in the .20 range. In Julie’s case, a correlation of .20 is probably an upper bound.”
“The final step is a simple arithmetic combination of the three numbers you have now produced: you must adjust from the mean, in the direction of your intuitive guess, in proportion to the correlation you have estimated. This step simply extends the observation we have just made: if the correlation were 0, you would stick to the mean; if it were 1, you would disregard the mean and happily make a matching prediction.”
“In Julie’s case, then, the best prediction you can make of GPA is one that lies no more than 20% of the way from the mean of the class in the direction of the intuitive estimate that her reading age suggested to you. This computation leads you to a prediction of about 3.3.”
“We have used Julie’s example, but this method can be applied just as easily to many of the judgment problems we have discussed in this book. Consider, for instance, a vice president of sales who is hiring a new salesperson and has just had an interview with an absolutely outstanding candidate. Based on this strong impression, the executive estimates that the candidate should book sales of $1 million in the first year on the job—twice the mean amount achieved by new hires during their first year on the job.”
“How could the vice president make this estimate regressive? The calculation depends on the diagnostic value of the interview. How well does a recruiting interview predict on-the-job success in this case? Based on the evidence we have reviewed, a correlation of .40 is a very generous estimate. Accordingly, a regressive estimate of the new hire’s first-year sales would be, at most, $500K + ($1 million − $500K) ×.40 = $700K. This process, again, is not at all intuitive. Notably, as the examples illustrate, corrected predictions will always be more conservative than intuitive ones: they will never be as extreme as intuitive predictions, but instead closer, often much closer, to the mean.”