A landmark study published this summer in the journal Science estimated the "reproducibility" of psychological research. It rightly received massive media attention, much of it centered on questions of whether research in psychology should be trusted. But the research in that field is not alone in being questioned.
Academic research, especially in the social sciences, is undergoing a profound change today that is born of a moment of crisis about the trustworthiness of research findings. There has been increased scrutiny over when we "know" what we think we know. Such scrutiny includes questions about whether a single study can or should serve as a definitive answer to a question, as well as on how statistics should be used and interpreted. As a colleague once asked us, "Would you set national policy based on the results of a single study?" We would not, nor should anyone else.
Education isn't ignoring these issues, but such questions do not yet dominate our education research discussions. Our 2014 paper in Educational Researcher, ”Facts Are More Important Than Novelty: Replication in the Education Sciences,” spurred significant discussion, but more than discussion is needed.
—Getty
Education can make a singular contribution to the evolution of how social-science research is conducted and interpreted. This is because the field has vast experience in an area highly related to replication: program evaluation. For example, despite generally being conducted with the best of intentions, some replication attempts are being met by a considerable and growinganti-replication backlash, accusations of bullying, and even concerns about possiblegender or race bias in replication. These are exactly the issues that often come up in discussions of education program evaluation. So we can pull lessons from evaluation to make sense of the growing paradox surrounding replication: How can something almost universally acknowledged to be valuable be so often reviled and controversial?
As researchers who have been involved in both replication research and program evaluation, we believe that if replication is viewed as a special case of evaluation, members of the education community can (and should) lead the charge on using replication to improve scientific research. What follows are some lessons we've pulled from years of evaluating education programs, or, more to the point, lessons about the psychology of program evaluation. They are not meant to be exhaustive, but rather a springboard for more discussion about replication within the education sciences.
• No One Likes to Be Evaluated.
Everyone tends to be a fan of evaluation ... until their work is the focus of those verification efforts. Replication is no different. When one of us was involved with an evaluation of a federal agency, most of the staff members were helpful and congenial; ironically, the unit within the agency tasked with promoting rigorous program evaluation was the most resistant to being evaluated. It's just not human nature to welcome an external evaluation with open arms, and replication is no different.
• A Weak Defense Is Often Worse Than No Defense.
Because of the aversion to evaluation, a common response by someone whose work is being evaluated is, "But we've already been evaluated!" These previous evaluations, upon closer inspection, often tend to be self-evaluations, evaluations conducted with or by close colleagues, or those based on satisfaction surveys (that is, a low level of evidence). This defensiveness weakens one's arguments from the start and should be avoided. As replications slowly become more common in the social sciences, we have observed similar knee-jerk responses (such as complaints that "my study has already been replicated," when, on closer examination, that proves not to be the case). The best compliment for anyone's research can be found in multiple, independent replications of the original study. That may not be fun for the researcher, but that's science.
"There has been increased scrutiny over when we 'know' what we think we know."
• Don't Be a Jerk.
The motivation behind the vast majority of replications we've seen is to conduct sound scientific inquiry. However, any expectation on the part of replicators that the replicatee will be thrilled to have his or her work evaluated is probably naïve, especially if that researcher is approached in a manner that could be interpreted as hostile. An evaluator whose goal is to prove someone wrong is not one who will be well received, but an evaluator seeking to understand what is (or isn't) happening, in an open and fair manner, will be much more welcome. Yet, in other fields, there have been instances of poor judgment, in which replicators have discussed their largely negative results on blogs in unfortunate tones. To paraphrase The Dude from the movie "The Big Lebowski," they're not wrong, they're just jerks. Honest, rigorous evaluation is an essential component of academic research. Being a jerk, gloating, and bragging do not need to be part of the research process, and can work against the effectiveness of a replication attempt.
• Replication Isn't Easy.
Just as there are best practices when conducting a program evaluation, standards for conducting replications should be established. Replication procedures, however, are still in their relative infancy. Many scholars, including the economist and Nobel laureate Daniel Kahneman, have recently proposed a new etiquette for replications, suggesting that replicators must make a "good-faith effort to consult with the original author" and then report this correspondence along with the final manuscript, so that reviewers can integrate it into their process of assessing the replication. Original authors who are not responsive or helpful cannot tank the replication, and replicators who don't accurately replicate the original methods are identified before publication.
These suggestions, in the main, make sense to us. But we aren't convinced that a formal partnership is necessary. We are both in the process of conducting replications of major studies within our fields of interest. In both cases, we approached the original authors to let them know we loved their studies and wanted to replicate them—not out of any sense that they are wrong or fraudulent, but because their results, if replicated successfully, are potentially very important. Both sets of authors responded enthusiastically. If one treats an evaluation as an aggressive exercise, things will not go well. Replication is no different.
• Don't Judge a Book by Its Cover.
Others have suggested that the relative inexperience of a researcher could be associated with failure to replicate original findings, with the implication that graduate students and junior faculty members should not conduct replications. Experience can be helpful, to be sure, but casting aspersions on entire groups of researchers is the type of argument social scientists typically spend their careers fighting, not propagating. There are plenty of weak evaluators out there, and it stands to reason that there are also plenty of weak replicators. But those least entrenched in a field can often be the best at identifying potentially fatal flaws in research findings. And what better way to learn methods than to replicate seminal studies?
• Results Are Rarely Appreciated at the Time.
The results of a program evaluation are often underappreciated when the study is concluded. This is especially true when the report contains constructive criticism and recommendations for significantly improving the program in question. But after a period of time—weeks, months, or even years—people gain emotional distance from the recommendations, take them much less personally, and view the suggestions for improvement in a new light. The same will likely be true with replications.
Replication is a critical, if underused, part of the scientific process. It has become both more popular and more controversial recently, but we should not allow the controversies to outweigh the many benefits for education. Because inaccurate findings pollute the scientific environment, the goals of a good replicator should be to identify these pollutants so that they can be removed from the environment, but with the tacit admission that one person's pollutant may be another person's life's work and passion. We hope other education researchers join us in our fight to change the research climate to one that encourages clean and kind replications.