Piaw's Blog: Review: Noise - A Flaw in Human Judgement

Noise is a book about human judgement. The book's unique perspective about judgement is that judgement is an attempt to use the human mind as an instrument. Of course, the human mind is unreliable. Your judgement is not just biased as a result of your lived experience, but also inconsistent --- your decisions about important things (e.g., judges sentencing someone) could be affected by the weather, your internal state of hunger, and and whether or not you just had a spat with your spouse:

Eliminating bias from a set of judgments will not eliminate all error. The errors that remain when bias is removed are not shared. They are the unwanted divergence of judgments, the unreliability of the measuring instrument we apply to reality. They are noise. Noise is variability in judgments that should be identical. (Kindle loc 5183)

The early part of the book establishes several things:

Many simple algorithms outperform expert humans (e.g. mechanical diagnostic rules outperform many doctors) purely because they're consistent
Machine learning algorithms do even better, but not by a lot
Humans are unforgiving of algorithms once they've seen that it makes a mistake (this is why self-driving cars don't merely have to match the standard of driving that humans achieved to be accepted --- they have to outperform humans in nearly all situations)
It's sexy to have an anti-bias program, but unsexy to talk about noise. But from the point of view of judgement errors, they have the same impact, and it's easier to fix noise than to fix bias

people are willing to give an algorithm a chance but stop trusting it as soon as they see that it makes mistakes. On one level, this reaction seems sensible: why bother with an algorithm you can’t trust? As humans, we are keenly aware that we make mistakes, but that is a privilege we are not prepared to share. We expect machines to be perfect. If this expectation is violated, we discard them. Because of this intuitive expectation, however, people are likely to distrust algorithms and keep using their judgment, even when this choice produces demonstrably inferior results. This attitude is deeply rooted and unlikely to change until near-perfect predictive accuracy can be achieved. Fortunately, much of what makes rules and algorithms better can be replicated in human judgment.(kindle loc 1917)

One interesting thing is that there are people who exhibit less noise than others. This was covered by Philip Tetlock in his research on political experts and predictions. The super-forecasters have one particular characteristic that's important:

To be actively open-minded is to actively search for information that contradicts your preexisting hypotheses. Such information includes the dissenting opinions of others and the careful weighing of new evidence against old beliefs. Actively open-minded people agree with statements like this: “Allowing oneself to be convinced by an opposing argument is a sign of good character.” They disagree with the proposition that “changing your mind is a sign of weakness” or that “intuition is the best guide in making decisions.” In other words, while the cognitive reflection and need for cognition scores measure the propensity to engage in slow and careful thinking, actively open-minded thinking goes beyond that. It is the humility of those who are constantly aware that their judgment is a work in progress and who yearn to be corrected. We will see in chapter 21 that this thinking style characterizes the very best forecasters, who constantly change their minds and revise their beliefs in response to new information. Interestingly, there is some evidence that actively open-minded thinking is a teachable skill. (location 3300)

(Incidentally, if you read that description of an actively open-minded person carefully, you'll note that there's one profession where that trait is not only encouraged, but it is essential: scientists!)

The second half of the book discusses how to get rid of noise, or at least, reduce it. Much like "how to lose weight," you may find that you already know most of the techniques, and are already using it in some arenas (such as hiring and interviewing):

Structure your decisions. By splitting off the decision into multiple facets, deciding on criteria and rating each facet separately, you prevent the halo effect of one particularly outstanding facet overshadow your ability to independently assess your other facets. (when interviewing candidates, you get each person interviewing those candidates to focus on a different facet to assess)
Humans are better at ranking decisions than at absolute comparisons. It is far better to have a few instances for people to compare against, than to try to construct a scale that everyone agrees on. For instance, you might think that on a scale of 1 to 10, a 10 means "in the top 10%", but someone else might never give a 10, because to her, a 10 means "perfect", and nothing is ever perfect. But given a list of examples, it's probably easier for two people to agree that X is a better engineer than Y, who is in turn better than Z.
When it comes to group decisions, ensure that there is independence between people who are assessing the decision. Rather than doing a round-table discussion, make everyone write down what their assessment, and show aggregate/anonymized sentiment charts before starting the discussion. This will allow contrarian folks to see that the "groupthink" sentiment might not be as dominant as it seems from a purely verbal discussion, and prevents corrupting later speakers with the opinions of the early speakers.
When you don't have a group to make a decision, take advantage of the inconsistency of your own judgement by separating the facets and making assessments at different times, writing them down, and integrating your judgement on different days. This gives you a chance to average out the noise in your judgement.
Appoint a bias observer with a checklist to look for fallacies in decision making. (There's a sample checklist in the book)
Treat a one-time decision as though it's a recurring decision that's made once. It's worth the effort to break it down and structure it as though it's going to happen again.
When picking a team to make decisions, it's better to pick a team with a diverse set of skills than to rank order who are great decision makers and just pick the top N.

Did I just summarize the book so you don't have to read it? No. Much of the book discusses the complexity involved in the above rules. For instance, in rule 2, they discuss:

Many executives object to the notion that nearly all employees can meet expectations. If so, they argue, the expectations must be too low, perhaps because of a culture of complacency. Admittedly this interpretation may be valid, but it is also possible that most employees really do meet high expectations. Indeed, this is exactly what we would expect to find in a high-performance organization. You would not sneer at the leniency of the National Aeronautics and Space Administration’s performance management procedures if you heard that all the astronauts on a successful space mission have fully met expectations. (kindle loc 4192)

a system that depends on relative evaluations is appropriate only if an organization cares about relative performance. For example, relative ratings might make sense when, regardless of people’s absolute performance, only a fixed percentage of them can be promoted—think of colonels being evaluated for promotion to general. But forcing a relative ranking on what purports to measure an absolute level of performance, as many companies do, is illogical. And mandating that a set percentage of employees be rated as failing to meet (absolute) expectations is not just cruel; it is absurd. It would be foolish to say that 10% of an elite unit of the army must be graded “unsatisfactory.” (kindle loc 4196)

In discussing having a decision/bias observer, they note:

A decision observer is not an easy role to play, and no doubt, in some organizations it is not realistic. Detecting biases is useless if the ultimate decision makers are not committed to fighting them. Indeed, the decision makers must be the ones who initiate the process of decision observation and who support the role of the decision observer. We certainly do not recommend that you make yourself a self-appointed decision observer. You will neither win friends nor influence people. (kindle loc 3400)

The authors observe that performance systems at most companies not just suck, but are actively counter-productive:

if you do measure performance, your performance ratings have probably been pervaded by system noise and, for that reason, they might be essentially useless and quite possibly counterproductive. Reducing this noise is a challenge that cannot be solved by simple technological fixes. It requires clear thinking about the judgments that raters are expected to make. Most likely, you will find that you can improve judgments by clarifying the rating scale and training people to use it consistently. This noise-reduction strategy is applicable in many other fields. Speaking of Defining the Scale “We spend a lot of time on our performance ratings, and yet the results are one-quarter performance and three-quarters system noise.” “We tried 360-degree feedback and forced ranking to address this problem, but we may have made things worse.” “If there is so much level noise, it is because different raters have completely different ideas of what ‘good’ or ‘great’ means. They will only agree if we give them concrete cases as anchors on the rating scale.” (kindle loc 4257)

Do I have criticisms of this book? Yes. It's frequently repetitive, and the authors clearly stitched together the book by writing various sections separately. As a result, one section of the book will repeat items from a previous section of the book. By the time you've finished the book, you'll feel as though a dead horse has been both beaten and flogged.

But this is such an important topic, and has such wide applicability (Which candidate do we hire? Which employee should we promote? Which job offer should you take? Which graduate school do you attend? Which car do you buy?) , and current practices so poor (think about how infrequently we structure even major decisions like an acquisition) that the book is very valuable in forcing you to slow down and think hard about how the process of making decisions. The culture today prizes intuition, and the book points out that trying to take out intuition will lead to a backlash and might not be desirable anyway, but instead, the correct approach should be to delay the use of intuition until it's been fully informed through a valid process. Only then can intuition lead to your best available decision. The book points out that the process need not be slow, and provides many case studies on how it can be used.

That makes this book important and valuable reading, both in business and in personal life. Highly recommended.

Piaw's Blog

Auto Ads by Adsense

Booking.com

Thursday, September 16, 2021

Review: Noise - A Flaw in Human Judgement

No comments:

Buy My Books

Subscribe

Links of Note

Blog Archive

Labels

Other Interesting Blogs

Followers

Contributors

Gayle's Ad

AmazonWWLink

Analytics

Amazon CPM Ads