Ratings-based evaluations in lineups may help jurors better evaluate eyewitness testimony. This is the bottom line of a recently published article in Law and Human Behavior. Below is a summary of the research and findings as well as a translation of this research into practice.

Featured Article | Law and Human Behavior | 2017, Vol. 41, No. 4, 375-384

Mock-Juror Evaluations of Traditional and Ratings-Based Eyewitness Identification Evidence


James D. Sauer, University of Tasmania
Matthew A. Palmer, University of Tasmania
Neil Brewer, Flinders University


Compared to categorical identifications, culprit likelihood ratings (having the witness rate, for each lineup member, the likelihood that the individual is the culprit) provide a promising alternative for assessing a suspect’s likely guilt. Four experiments addressed two broad questions about the use of culprit likelihood ratings evidence by mock-jurors. First, are mock-jurors receptive to non-categorical forms of identification evidence? Second, does the additional information provided by ratings (relating to discrimination) affect jurors’ evaluations of the identification evidence? Experiments 1 and 1A manipulated confidence (90% vs. 50%) and discrimination (good, poor, no information) between-participants. Evaluations were influenced by confidence, but not discrimination. However, a within-participant manipulation of discrimination (Experiment 2) demonstrated that evidence of good discrimination enhanced the persuasiveness of moderate levels of confidence, while poor discrimination reduced the persuasiveness of high levels of confidence. Thus, participants can interpret ratings-based evidence, but may not intuit the discrimination information when evaluating ratings for a single identification procedure. Providing detailed instructions about interpreting ratings produced clear discrimination effects when evaluating a single identification procedure (Experiment 3). Across four experiments, we found no evidence that mock-jurors perceived non-categorical identification evidence to be less informative than categorical evidence. However, jurors will likely benefit from instruction when interpreting ratings provided by a single witness.


eyewitness identification, confidence, ratings, juror evaluations of evidence

Summary of the Research

“Jurors perform an important task under difficult conditions: Nonexperts must often assess complex or ambiguous information to reach a decision of significant consequence. This may underlie the persuasiveness of eyewitness identification evidence for jurors. Jurors’ reliance on identification evidence might reflect their desire for an apparently clearcut indication of guilt in a setting often characterized by ambiguity. However, current identification practice confers two major limitations on identification evidence. First, eyewitness identification decisions are prone to error. Second, a categorical identification is not necessarily as informative as it may appear at face value. An identification of the suspect indicates that, of the lineup members presented, the suspect probably provided the best match to the witness’s memory of the culprit. However, it does not speak to how closely the suspect matched the witness’s memory in an absolute sense, or the extent to which the suspect was favored over the other lineup members. This information is important when assessing the diagnostic value of a suspect identification. Thus, although jurors find categorical identification evidence compelling, this relatively coarse index obscures important information relating to the witness’s recognition of the suspect” (p. 375).

“Ratings-based identification protocols avoid categorical identification responses and, instead, have witnesses rate the degree to which each lineup member matches their memory of the culprit. [The authors] examined mock-jurors’ evaluations of this non-categorical form of identification evidence, and whether mock-jurors’ evaluations of identification evidence could benefit from the additional information provided by ratings-based protocols. … results to date consistently show that, compared to standard categorical identification decisions, ratings-based approaches have provided a more sensitive approach to assessing the likely guilt of a suspect” (pp. 375-376).

“[P]rofile analyses … demonstrated that the guilt of the suspect varied almost monotonically according to the degree to which the suspect was favored over the alternative lineup members. … analyses revealed that the potential benefits of ratings-based identification evidence extend beyond improving the reliability of categorical classifications of suspect guilt. In this context, the confidence rating given to the suspect can be conceptualized as an index of recognition for the suspect (ranging from weak to strong), while ratings given to non-suspect lineup members can be conceptualized as indices of discrimination” (p. 376).

“Given (a) the limitations of categorical identification evidence, (b) the promising findings relating to the diagnostic utility of ratings-based evidence, and (c) the capacity for ratings to provide a richer source of information about the witness’s memory for the suspect, we investigated mock-jurors’ evaluations of this novel, ratings-based (cf. categorical) form of identification evidence…In each experiment we presented mock-jurors with a trial transcript containing incriminating identification evidence obtained using either a standard identification task (providing a categorical identification response and associated confidence judgment) or a rating procedure where, for each lineup member, the witness provided a rating indicating the likelihood that that person was the culprit. Thus, in ratings conditions, instead of reading that the witness identified the suspect with a particular level of confidence (e.g., 90%), jurors read that the witness provided a rating for the suspect (e.g., 90%) and for each of the other lineup members (e.g., between 0% and 10%)” (p. 377). This study utilized mostly undergraduate students in a series of experiments.

“Two key findings emerged. First, across experiments, when confidence for the suspect was high, there was no evidence that the absence of a categorical identification decision undermined the persuasiveness of the evidence against the defendant. Thus, mockjurors did not routinely dismiss noncategorical identification evidence. Second, although mock-jurors did not intuitively grasp the value of the additional information provided in the ratings-based evidence conditions when considering evidence provided by a single witness (Experiments 1 and 1A), information relating to the witness’s ability to discriminate did affect mock-jurors’ evaluations when they were able to compare discrimination across witnesses (Experiment 2). Furthermore, when provided with instructions on interpreting patterns of ratings, mock-jurors were able to apply this information in an adaptive way when evaluating evidence provided by a single witness (Experiment 3)” (p. 382).

“[M] ock-jurors’ evaluations of identification evidence were shaped by the ratings given to other lineup members. These results indicate that, at least under some circumstances, mock-jurors were able to make sensible use of the identification ratings given to other lineup members when evaluating ratings-based identification evidence against the suspect. Specifically, these results suggest that mock-jurors valued information about not only the extent to which the suspect matched the witness’s memory of the culprit, but also the similarity of the suspect to the witness’s memory, relative to other lineup members” (pp. 380-381).

“[W]hen evaluating identification evidence from a single witness, overall, ratings given to non-suspect lineup members had little effect on mock-jurors’ verdict preferences. However, … when discrimination information was manipulated within-subjects, mock-jurors considered ratings given to other lineup members when evaluating identification evidence. Thus, the results … seem to indicate an inability to interpret, rather than unwillingness to consider, ratings for lineup fillers when evaluating identification evidence against a suspect” (p. 381).

Translating Research into Practice

“Compared to categorical identification decisions, ratings-based identification evidence provides a promising alternative method of assessing suspect guilt, and a richer source of information for triers-of-fact assessing the reliability of the identification evidence and the likely guilt of a defendant. Although more empirical work is needed to establish the effectiveness of this technique in police investigations, we believe that a richer type of identification evidence has a variety of potential benefits for the investigative process, prosecutors’ decisions to prosecute suspects, and judges’ summaries of presented identification evidence. However, jurors’ ability to process this information effectively is central to the applied utility of ratings-based identification evidence if presented in court” (p. 382).

“[F]indings may shed light on the conditions under which mock-jurors are likely to make use of information that adds nuance to evaluations of identification evidence. First, mock-jurors may be more likely to consider such information if that information pertains to the witness’s ability to discriminate between memory outputs that are more or less likely to be correct in the present context … Second, and related to the above point, mock-jurors may lack the cognitive framework required to readily intuit the implications of such information for the reliability of the witness’s memory in a given context. It may be necessary to provide instruction to help mock-jurors interpret and apply pertinent information” (p. 382).

“In summary, although our results suggest that mock-jurors did not readily intuit the value of the discrimination information provided for individual witnesses, they also demonstrate that mockjurors did not immediately dismiss noncategorical identification evidence as uninformative. More importantly, the results show that following instruction mock-jurors applied this additional information sensibly. Given that mock-jurors often experience problems assessing the reliability of identification evidence, we take participants’ responses to the relatively minimalistic instructions used in these experiments as a sign of mock-jurors’ willingness and ability to consider the information provided, and the potential applied utility of ratings-based identification evidence” (p. 383).

Other Interesting Tidbits for Researchers and Clinicians

“Although the additional information provided by ratings-based identification evidence – relating to the relationship between memory quality, discrimination, and the ratings given to a target versus fillers in a lineup – may be readily understood by memory researchers, this may not be true for lay-people. Given the problematic views of memory commonly espoused by jury-eligible samples and decision-makers in the criminal justice system, it may be unrealistic to expect jurors to intuitively recognize the value of the additional information provided by ratings-based evidence (i.e., relating to the witness’s ability to discriminate between previous seen and unseen lineup members). Lay-people may simply lack the cognitive framework necessary to interpret this information. Thus, jurors may be insensitive to the additional information provided by ratings given to fillers in the lineup” (p. 376).

“Mock-jurors found the incriminating testimony more persuasive when the witness expressed high (cf. low) confidence, unless cross-examination revealed that the witness made an error regarding one detail of their testimony … If the witness made an error regarding this detail, confident witnesses were deemed less credible than unconfident ones. These results suggest that, under some circumstances, jurors might be sensitive to information that relates to a witness’s ability to discriminate between correct and incorrect responses” (p. 377).

“[I]t is interesting to note that in the moderate confidence condition, mock-jurors tended to favor ‘not guilty’ verdicts in the standard identification and poor discrimination conditions but ‘guilty’ verdicts in the good discrimination condition. Thus, even when suspect ratings were relatively low (i.e., 50%), jurors still tended to favor ‘guilty’ (cf. ‘not guilty’) verdicts provided the witness demonstrated the ability to discriminate between lineup members” (p. 381).

Authored by Eliza Kopelman

Eliza Kopelman is a first year master’s student in the Forensic Psychology program at John Jay College. She graduated in 2015 with her B.A. in psychology and English from Brandeis University, and then went on to work as a community residence counselor at McLean Hospital in Belmont, MA before coming back to school. Eliza’s research experience is on levels of psychopathy in sex offenders, and her professional interests include crime scene analysis and violent risk assessment.