FOREMAN. . . . Nine . . . ten . . . eleven . . . That’s eleven for guilty. Okay. Not guilty? (EIGHT’s hand is raised.) One. Right. Okay. Eleven to one, guilty. Now we know where we are.
THREE. Somebody’s in left field. (To EIGHT) You think he’s not guilty?
EIGHT (quietly). I don’t know.
THREE. I never saw a guiltier man in my life. You sat right in court and heard the same thing I did. The man’s a dangerous killer. You could see it.
EIGHT. He’s nineteen years old.
THREE. That’s old enough. He knifed his own father. Four inches into the chest. An innocent nineteen-year-old kid. They proved it a dozen different ways. Do you want me to list them?
TEN (to EIGHT). Well, do you believe his story?
EIGHT. I don’t know whether I believe it or not. Maybe I don’t.
SEVEN. So what’d you vote not guilty for?
EIGHT. There were eleven votes for guilty. It’s not so easy for me to raise my hand and send a boy off to die without talking about it first.
SEVEN. Who says it’s easy for me?
EIGHT. No one.
SEVEN. What, just because I voted fast? I think the guy’s guilty. You couldn’t change my mind if you talked for a hundred years.
EIGHT. I don’t want to change your mind. I just want to talk for a while.
Twelve Angry Men
We operate under a jury system in this country, and as much as we complain about it, we have to admit that we know of no better system, except possibly flipping a coin.
Generally speaking, we can observe that the scientists in any particular institutional and political setting move as a flock, reserving their controversies and particular originalities for matters that do not call into question the fundamental system of biases they share.
When legal juries convene, judges and attorneys are (or, at least should be) highly aware of the cognitive biases that affect rational decision-making. Competent attorneys seek to exacerbate or mitigate these biases to benefit their clients, sometimes aided by jury consultants, and competent judges strive for due process. These are highly skilled professionals who have developed considerable knowledge [just search online for ‘jury decision making psychology’, ‘litigation psychology’, ‘legal psychology’, and so on].
When academic juries convene – as in review of funding applications or promotion cases – the deliberation is often uninformed by any consideration of possible cognitive bias. To the contrary, the jurors assume that – because they are academic faculty, having been selected for their training, accomplishment, and critical ability – they are capable of completely rational decision-making and immune to cognitive biases. To accuse an academic peer reviewer of bias is a serious charge indeed. As for knowledge about the impact of cognitive biases in peer review, aside from implicit bias with respect to the typical axes of diversity, scholarly works are few and far between. van Arensbergen et al. is the best recent review.
What do we know about group decision-making, such as in legal juries, that might also interfere with academic peer review, and what might be done about it?
In both cases, the ideal is consistently to apply a standard (the law, academic contribution and promise) to data in the same way that trained graders of gemstones or judges of animal breeds can in isolation arrive at identical assessments. In both cases we stray far from the ideal because (a) we are judging people, and (b) the jurors are people, and the judging is not private.
‘Judging people’ means that, because no two people are the same, the data under consideration are never identical (cf. ‘comparing apples and oranges’). As the Physics Department at MIT correctly states, “ There are as many different successful paths to tenure as there are tenured faculty members…”
‘The jurors are people’ means that they have normal human minds, which often jump to conclusions in predictable ways (or, in techspeak, use heuristics). Often these jumps to conclusions are useful shortcuts, but they can bias assessments in characteristic ways, including:
- Framing. The way in which information is presented influences assessment. For example, “70% of her papers appeared in high-impact journals” and “30% of her papers did not appear in high-impact journals” are quantitatively identical, but the latter is interpreted more negatively than the former.
- Anchoring (‘if it costs that much, it must be good’). A form of framing in which the first information given influences the interpretation of subsequent information. In real estate or automobile sales, if the asking price or list price is given first, subsequent negotiations will proceed from this price rather than a true estimate of value. If peer reviewers first learn that a candidate is highly/poorly cited, a full professor vs. a lower rank, from a prestigious vs. second-tier institution, the assessment can differ accordingly.
- Hindsight bias (‘anybody could have seen it coming’): For both promotion and funding applications, reviewers may alter their assessment according to whether flaws were/are foreseeable (and can be anticipated if not avoided or prevented), or not. We are biased to overestimate foresight and blame others for not foreseeing what, in hindsight, appears obvious (cf. ‘Monday morning quarterbacking).
- Confirmation bias (‘my mind is made up; don’t confuse me with the facts’): We overemphasize data that confirm preliminary conclusions, and tend to ignore data that contradict them.
- Egocentric bias (‘I get it; therefore everyone else should’). We assume what is comprehensible, easy, or obvious to us is comprehensible, easy, or obvious to others. Therefore we neglect to communicate appropriately to others, who are baffled or reach wrong conclusions. [This is more a problem for those being assessed than those doing the assessing.]
- Recency bias or Primacy effect (‘but that was then’): in assessment, we tend to overweight recent accomplishments and information, and underweight prior accomplishments and information, all else equal.
- Presentation bias (looks good, therefore is good; looks bad, therefore is bad): As Chia-Jung Tsay begins in her PNAS paper: “We do judge books by their covers. We prefer the nicely wrapped holiday gifts, fall in love at first sight, and vote for the politician who looks most competent.” The way in which academic accomplishments are presented can influence our assessment of our merit.
Not private, as in ‘The jurors are people, and the judging is not private’ means that judges and jurors are just as much on trial as those being assessed [well, maybe not as much, because the judges and jurors have typically already been promoted, but on trial nonetheless.] The academic judges and jurors are under pressure to seem fair, reasonable, knowledgeable, critical, and deferential to high academic standards in the eyes of their colleagues. They are members of communities in which stature and access to resources depend in part on their perceptions by other members. They are automatically and naturally susceptible to implicit biases, snobbery, envy, and a desire for retribution, but often must behave before their colleagues as if these tendencies don’t exist. These pressures can deform academic due process and judgment. For example:
A man walks into a room. Seven others are already seated there, and all are told they are to engage in a judgment. They are shown two sets of lines
and asked: “Which line on the right is the same as the one on the left?” All give the correct answer. The judgment is repeated several times with different sets of lines until it is routine. Then, by prearrangement, the seven already in the room start giving a conspicuously wrong answer. At first, the man disagrees. Eventually, however, the man joins the majority in giving the wrong answer.
Our minds are highly attentive to what others think and do, might think and do, and might think of us and do to us. This tendency can exacerbate existing biases or introduce additional biases when peer review is not completely private. These include:
- Anchoring is described above. In any group, typically the first statement made anchors the discussion. That is, if the first speaker supports or opposes promotion/funding, the ensuing discussion ensues in relation to this first statement. If on the old NIH rating scale the first reviewer proposes a score of 2.0, others might disagree by proposing scores of 1.7 to 2.3. If the first reviewer had instead proposed 3.0, others might disagree with scores of 2.7 to 3.3. Many promotion reviews occur in multiple steps (e.g., department, promotions committee, dean, and provost); the early steps tend to anchor the later steps.
- Social Influence/Groupthink/The Bandwagon Effect/The Abilene Paradox: The desire for concurrence, consensus, and cohesion in a group can temper or even overwhelm independent assessment. That is, participants in peer review will alter their independent assessments so as to conform to a majority opinion. This undermines ‘the wisdom of crowds’. As in the Asch experiment, those holding a minority judgment can be tempted to mute or abandon it because they are in minority rather than because the judgment is incorrect. Those in the majority can be overly resistant to challenges to their conclusion and prone to confirmation bias.
- Social anxiety; fear of displaying stupidity, low standards, limited expertise, poor judgment, lack of confidence. Such displays undermine status in academia, and with it access to resources and resistance to threats. When their judgments can become known, assessors may express no judgment or pretend to agree with the majority or seemingly assertive, authoritative assessors to avoid ‘looking bad’ in the eyes of their peers.
Admittedly these impacts are largely extrapolated to academic assessment from work in other contexts and, as noted, studies of academic peer review are few. These works suggest that extrapolation is not unwarranted, however.
What can be done to minimize the influence of these cognitive biases and tendencies?
√ Education. As has been stated, in academia reputation and stature are linked to the demonstration of sound judgment and wisdom free of bias, and simply knowing that fellow judges are on the lookout for bias ought to deter it. A precondition is that the fellow judges must know what biases to look for, and individuals must know what biases to avoid displaying.
√ Leadership of peer review groups. Those who chair such groups should be painfully aware that, for example, those who speak first and those who speak last will be influential (serial position effect), that an assertive high-status person who speaks first will anchor the discussion, that social anxiety will squelch valid points of view, and so on. Chairs can routinely ask: “How may cognitive biases and tendencies have influenced our judgment in inappropriate ways?” [If this is routine, it is not an accusation – and accusations can backfire.] At my institution chairs are coached to call on committee members in random order and to call on members who are reticent to speak spontaneously. [This has a collateral benefit of inducing all members to study each case carefully and discourages social loafing.]
√ Avoid framing and anchoring. In an ideal world, all framing (identities and affiliations of the candidate, authorship position and co-authors, titles of publications outlets, honorifics, etc.) would be redacted from review materials so that reviewers would be forced to focus on the unadorned work itself. This is likely impossible [although at my institution one ambitious department did its initial screen of applicants for a position by considering only the abstracts of exemplary publications – with bibliographic and other identifying information redacted]. But some framing/anchoring can be withheld. For example, in promotion discussions at my institution we used to begin with the de-identified preliminary scoring of the members; we no longer do.
√ Empower all reviewers. Many disasters are due to groups not working together and members’ reticence to express concerning observations to their superiors. To combat this in aviation, crews practice cockpit resource management or CRM.
CRM aims to foster a climate or culture where the freedom to respectfully question authority is encouraged. It recognizes that a discrepancy between what is happening and what should be happening is often the first indicator that an error is occurring. This is a delicate subject for many organizations, especially ones with traditional hierarchies, so appropriate communication techniques must be taught to supervisors and their subordinates, so that supervisors understand that the questioning of authority need not be threatening, and subordinates understand the correct way to question orders. These are often difficult skills to master, as they may require significant changes in personal habits, interpersonal dynamics, and organizational culture.
The way this might work in peer review is:
- Opening or attention getter – Address the individual. “Hey Chair,” or “Professor Smith,” or “Jane,” or however the name or title that will get the person’s attention.
- State your concern – Express your analysis of the situation in a direct manner while owning your emotions about it. “I’m concerned that we are being unfair to this candidate,” or “I’m worried that the discussion is going off track.”
- State the problem as you see it – “We are overemphasizing the letters of reference, which are subject to implicit bias,” or “We are being unduly influenced because the paper is in Science, and are not fully examining its contribution.”
- State a solution – “Let’s look carefully at the adjectives in the letters,” or “Can someone explain why the work in Science is a significant contribution?”
- Obtain agreement (or buy-in) – “Does that sound good to you, Ms. Chairperson?”
√ Voting: Always vote by private if not secret ballot or its equivalent. The rationale is obvious.
Cognitive bias in peer assessment hopefully is rare and of little influence. Prudence dictates that we expect it, combat it, and are pleasantly surprised when it doesn’t occur. The only always ineffective strategy is to pretend it doesn’t exist.
 Rose, Reginald. 1955. Twelve Angry Men. http://amzn.com/1417812656 https://docs.google.com/document/d/1irVXTuMAQESSwtoqOtQiC_-5dZa59LCmOxA_IQzlxww/edit
 Gunnar Myrdal, Objectivity in Social Research, as cited in Klein, DB and Stern, C. 2009. Groupthink in academia: majoritarian departmental politics and the professional period. Independent Review 13: 585-600.
 An outstanding source is http://kirwaninstitute.osu.edu/wp-content/uploads/2014/03/2014-implicit-bias.pdf
 van Arensbergen, P, van der Weijden, I, and van den Besselaar, P. 2014. The selection of talent as a group process. A literature review on the social dynamics of decision making in grant panels. Research Evaluation 23: 298-311. doi: 10.1093/reseval/rvu017 http://www.vandenbesselaar.net/_pdf/2014%20Prpic.pdf
 “Might”≠”will”. In my experience, the vast majority of peer assessment leads to sound judgment. This does not excuse us from increasing the size of this majority and reducing the incidence of misjudgments rooted in cognitive biases, etc., however. Informing ourselves about these biases is a necessary first step. My institution is committed to reducing if not eliminating the influence of cognitive bias on promotion decisions.
 The stages of truth:
- It’s not possible
- It’s possible but either impossible to test or not worth doing
- It’s obvious and we knew it all along
https://cs.uwaterloo.ca/~shallit/Papers/stages.pdf is an exhaustive and masterful review of the concept. Confirmation bias is the third stage.
 Although there must be a pre-existing name for this bias in the literature, I do not know it.
 Tsay, Chia-Jung. 2013. Sight over sound in the judgment of music performance. Proceedings of the National Academy of Sciences USA 110: 14580-14585. More general discussion in http://www.theatlantic.com/business/archive/2013/09/the-science-of-snobbery-how-were-duped-into-thinking-fancy-things-are-better/279571/
 https://en.wikipedia.org/wiki/Asch_conformity_experiments. The Candid Camera counterpart is at https://vimeo.com/61349466
 Perceptive readers will recognize this statement as an example of framing and anchoring. Humans are not that susceptible to normative social influence, but are susceptible nonetheless.
“…when group members know who the experts are in reference to a specific task, they will adjust their group decision to the decision of the experts… …in general high-status members talk more and receive more attention from other members. Low status members generally talk less or even do not talk at all when their opinions deviate from those of high-status members. This can harm decision making processes because not all true opinions are expressed and high-status people will not be contradicted often… members of cohesive groups may want to preserve the group’s relationships and therefore avoid any kind of behavior considered to be harmful. This could mean that people agree to group decisions while they actually do not agree with it individually… …people paid more attention to preference-consistent information than to information that conflicted with their preferences. This effect was even stronger when confirming information was introduced by the person himself than by other group members. Whether people adjust their initial preference based on new information that is contributed to the discussion is strongly influenced by social validation. …people defend their initial preference and in order to convince others they mention more information that supports their preference. But it can also be the result of more unconscious processes: people consider preference-consistent information as more accurate and relevant and therefore pay more attention to it. …panelists may use different strategies or social tactics in processes of decision making, e.g. consultation, pressure, personal appeals, and coalition tactics. The use of social tactics to influence one another is affected by status differences. …Often, opening statements serve as point of reference for all statements being made thereafter. With regard to panel review, this implies the comments of the first reviewer are very influential and set the tone for further discussion. Knowledge of these cognitive heuristics can be implemented as social tactics when panelists actively use them to influence negotiation outcomes. …Groupthink is more likely to occur in groups where any degree of accountability is absent. Making individuals accountable is found to be more effective on reducing groupthink tendencies than making them collectively accountable as a panel… …The combination of the large scope of applications to be evaluated and the restricted time available reduces the ambitions of panelists to execute very rigorous reviews. When panels experience strong time pressure, reviewers pay more attention to shared information and less attention to alternatives, consequently resulting in a closing of the mind. People tend to rely more on cognitive heuristics …and are more focused on reaching (cognitive) closure Therefore, high time pressure is considered an important antecedent for groupthink .”
Klein and Stern (see endnote 3) treat very generally how groupthink may devalue dissenting views, even if well-founded, in academic units. Langfeldt shows that peer review groups may autonomously establish priorities or practices that influence outcomes, or as she puts it “The guidelines given to the panels had little effect on the criteria they emphasized… Put more clearly, panels do as they like…” Recently Park et al. have modeled how peer pressure in review – herding – may stifle innovation (or penalize innovators).
©Martin E. Feder 2015