Unreliable opinions can result in arbitrary or unjust legal outcomes for forensic examinees, as well as diminish confidence in psychological expertise within the legal system. This is the bottom line of a recently published article Translational Issues in Psychological Science. Below is a summary of the research and findings as well as a translation of this research into practice.

Featured Article | Translational Issues in Psychological Science | 2017, Vol. 3, No. 2, 143-152

Why Do Forensic Experts Disagree? Sources of Unreliability and Bias in Forensic Psychology Evaluations


Lucy A. Guarnera, University of Virginia
Daniel C. Murrie, University of Virginia School of Medicine
Marcus T. Boccaccini, Sam Houston State University


Recently, the National Research Council, Committee on Identifying the Needs of the Forensic Science Community (2009) and President’s Council of Advisors on Science and Technology (PCAST; 2016) identified significant concerns about unreliability and bias in the forensic sciences. Two broad categories of problems also appear applicable to forensic psychology: (1) unknown or insufficient field reliability of forensic procedures, and (2) experts’ lack of independence from those requesting their services. We overview and integrate research documenting sources of disagreement and bias in forensic psychology evaluations, including limited training and certification for forensic evaluators, unstandardized methods, individual evaluator differences, and adversarial allegiance. Unreliable opinions can result in arbitrary or unjust legal outcomes for forensic examinees, as well as diminish confidence in psychological expertise within the legal system. We present recommendations for translating these research findings into policy and practice reforms intended to improve reliability and reduce bias in forensic psychology. We also recommend avenues for future research to continue to monitor progress and suggest new reforms.


forensic evaluation, forensic instrument, adversarial allegiance, human factors, bias

Summary of the Research

“Imagine you are a criminal defendant or civil litigant undergoing a forensic evaluation by a psychologist, psychiatrist, or other clinician. The forensic evaluator has been tasked with answering a difficult psycholegal question about you and your case. For example, ‘Were you sane or insane at the time of the offense? How likely is it that you will be violent in the future? Are you psychologically stable enough to fulfill your job duties?’ The forensic evaluator interviews you, reads records about your history, speaks to some sources close to you, and perhaps administers some psychological tests. The evaluator then forms a forensic opinion about your case—and the opinion is not in your favor. You might wonder whether most forensic clinicians would have reached this same opinion. Would a second (or third, or fourth) evaluator have come to a different, perhaps more favorable conclusion? In other words, how often do forensic psychologists disagree? And why does such disagreement occur?” (p. 143-144)

“While forensic evaluators strive for objectivity and seek to avoid conflicts of interest, a forensic opinion may be influenced by multiple sources of variability and bias that can be powerful enough to cause independent evaluators to form different opinions about the same defendant” (p. 144).

“Interrater reliability is the degree of consensus among multiple independent raters. Of particular
interest within forensic psychology is field reliability—the interrater reliability among practitioners performing under routine practice conditions typical of real-world work. In general, the field reliability of forensic opinions is either unknown or far from perfect” (p. 144).

“Besides the unreliability that may be intrinsic to a complex, ambiguous task such as forensic evaluation, research has identified multiple extrinsic sources of expert disagreement. One such source is limited training and certification for forensic evaluators. While specialized training programs and board certifications have become far more commonplace and rigorous since the early days of the field in the 1970s and 1980s, the training and certification of typical clinicians conducting forensic evaluations today remains variable and often poor” (p. 145).

“This training gap is important because empirical research suggests that evaluators with greater training produce more reliable forensic opinions” (p. 145).

“One likely reason why training and certification increase interrater reliability is that they promote standardized evaluation methods among forensic clinicians. While there are now greater resources and consensus concerning appropriate practice than even a decade ago, forensic psychologists still vary widely in what they actually do during any particular forensic evaluation… This diversity of methods—including the variety and at times total lack of structured tools—is likely a major contributor to disagreement among forensic evaluators” (p. 146).

“Even within the category of structured tools, research shows that forensic assessment instruments with explicit scoring rules based on objective criteria yield higher field reliability than instruments involving more holistic or subjective judgments” (p. 146).

“In addition to evaluators’ inconsistent training and methods, patterns of stable individual differences among evaluators—as opposed to mere inaccuracy or random variation—seem to contribute to divergent forensic opinions… Stable patterns of differences suggest that evaluators may adopt idiosyncratic decision thresholds that consistently shift their forensic opinions or instrument scores in a particular direction, especially when faced with ambiguous cases” (p. 146).

“Upon these concerns about unknown or less-than-ideal field reliability of forensic psychology procedures, we now add concerns about forensic experts’ lack of independence from those requesting their services. As far back as the 1800s, legal experts have lamented the apparent frequency of scientific experts espousing the views of the side that hired them (perhaps for financial gain), leading one judge to comment,
‘[T]he vicious method of the Law, which permits and requires each of the opposing parties to summon the witnesses on the party’s own account[,] . . . naturally makes the witness himself a partisan’. More modern surveys continue to identify partisan bias as judges’ main concern about expert testimony, citing experts who appear to “abandon objectivity” and “become advocates” for the retaining party” (p. 147).

Translating Research into Practice

“While many clinicians cite introspection (i.e., looking inward in order to identify one’s own biases) as a primary method to counteract personal ideology, idiosyncratic responses to examinees, and other individual differences research suggests that introspection is ineffective and may even be counterproductive. Thus, more disciplined changes to personal practice are needed. For example, when conducting evaluations for which well-validated structured tools exist, evaluators could commit to using such tools as a personal standard of practice. This would entail justifying to themselves (or preferably colleagues) why they did or did not use an available tool for a particular case. Practicing forensic evaluators could also use simple debiasing methods to counteract confirmation bias, such as the ‘consider-the-opposite’ technique in which evaluators ask themselves, ‘What are some reasons my initial judgment might be wrong?’ To increase personal accountability, evaluators could keep organized records of their own forensic opinions and instrument scores, or even help organize larger databases for evaluators within their own institution or locality. Using these personal data sets, evaluators might look for mean differences in their own instrument scores when retained by the prosecution versus the defense, or compare their own base rates of incompetency and insanity findings to those of their colleagues. Ambitious evaluators could even experiment with blinding themselves to the source of referral in order to counteract adversarial allegiance” (p. 149).

“Although individual evaluators can make many voluntary changes today in order to reduce the impact of unreliability and bias on their forensic opinions, other reforms require widerranging structural transformation. For example, state-level legislative action is needed to mandate more than one independent forensic opinion. Requiring more than one independent opinion is a powerful way to combat unreliability and bias by reducing the impact of any one evaluator’s error” (p. 149).

“Even slower to change than state legislation and infrastructure might be existing legal norms, such as judges’ current willingness to admit nonblinded, partisan experts. While authoritative calls to action like the NRC and PCAST reports may have some influence, most legal change only happens by the accretion of legal precedent, which is a slow and unpredictable process” (p. 149-150).

Other Interesting Tidbits for Researchers and Clinicians

“Foundational research should establish field reliability rates for various types of forensic evaluations in order to assess the current situation and gauge progress toward improvement. Only a handful of field reliability studies exist for a few types of forensic evaluations (i.e., adjudicative competency, legal sanity, conditional release), and virtually nothing is known about the field reliability of other types of evaluations, particularly civil evaluations” (p 144-145).

“Given that increased standardization of forensic methods has the potential to ameliorate multiple sources of unreliability and bias described here, more investigation of forensic instruments, checklists, practice guidelines, and other methods of standardization is a second research priority. Some of this research should continue to focus on creating standardized tools for forensic evaluations and populations for which none are currently available, particularly civil evaluations such as guardianship, child protection, fitness for duty, and civil torts like emotional injury. Future research can also continue to seek improvements to the currently modest predictive accuracy of risk assessment instruments. However, given the current gap between the availability of forensic instruments and their limited use by forensic evaluators in the field, perhaps more pressing is research on the implementation of forensic instruments in routine practice. More qualitative and quantitative investigations of how instruments are administered in routine practice, why instruments are or are not used, and what practical obstacles evaluators encounter are needed. Without greater understanding of how instruments are (or are not) implemented in practice—particularly in rural or other under resourced areas— continuing to develop new tools may not translate to their increased use in the field” (p. 148).

“A clear recommendation for improving evaluator reliability is that states without standards for the training and certification of forensic experts should adopt them, and states with weak standards (e.g., mere workshop attendance) should strengthen them. What is less clear, however, is what kinds and doses of training can improve reliability with the greatest efficiency. Drawing from extensive research in industrial and organizational psychology, credentialing requirements that mimic the type of work evaluators do as part of their job (e.g., mock reports, peer review, apprenticing) may foster professional competency better than requirements dissimilar to job duties (e.g., written tests). Given that both evaluators and certifying bodies have limited time and resources, research into the most potent ingredients of successful forensic credentialing is a third research priority” (p. 148-149).

Authored by Amanda Beltrani

Amanda Beltrani is a current graduate student in the Forensic Psychology Masters program at John Jay College of Criminal Justice in New York. Her professional interests include forensic assessments, specifically, criminal matter evaluations. Amanda plans to continue her studies in a doctoral program after completion of her Masters degree.