Psychology’s Largest Replication Effort Reveals Low Rates of Reproducibility

Photo: Wally Gobetz, Andy Warhol’s Campbell’s Soup Cans (NYC – MoMA)

During the past three years, 270 researchers from around the world recreated 98 psychology studies from three top journals to see if they could produce the same results. What they found is that on the whole, they couldn’t. As they report in Science, less than half of the original studies were successfully replicated.

At first blush, these results seem dismal. However, many say this is just the nature of scientific progress—trial, error, and revision.

“I don’t see this story as a pessimistic one,” said Brian Nosek, a psychologist at the University of Virginia who lead the project called the Open Science Collaboration. “The project is a demonstration of science demonstrating [its] essential quality—self-correction.”

The aim of the project, which began in 2011, was to estimate the reproducibility of the findings published in Psychological Science, the Journal of Personality and Social Psychology, and the Journal of Experimental Psychology: Learning, Memory and Cognition.

The researchers conducted 100 replications of 98 studies (two studies were replicated twice) and found that only slightly more than a third of the replication attempts resulted in statistically significant effects. This is in stark contrast to the findings of the original studies, where nearly all report statistically significant effects. In another measure of reproducibility, only about two out of five replication teams answered yes to the question “Did your results replicate the original effect?”

“There has been growing concern that reproducibility may be lower than expected or desired, Nosek said in a statement to Science. “There are many contributing factors, but an important one is the incentives individual researchers face in trying to survive and thrive as a scientist. Publication is the currency of science.”

Overall, Nosek suggests three main reasons for the discrepancy between the original studies and the replications. It could be that the original research found an effect where none exists, known as a false positive. It could be the replication team failed to find an effect where one does exist, known as a false negative. Or it could be that a slight variation in the methods or conditions of the original study and the replication lead to the difference.

However it’s still not completely clear how to gauge the results. “Science needs to involve taking risks and pushing frontiers, so even an optimal science will generate false positives,” Sanjay Srivastava, a psychologist at the University of Oregon told The Atlantic. “If 36 percent of replications are getting statistically significant results, it is not at all clear what that number should be.” Likewise, many are still trying to make sense of the finding that cognitive psychology studies replicated at a rate of about 50 percent, while the figure for social psychology stood around 25 percent.

With a number of questions answered and even more raised, Science Editor-in-Chief Marcia McNutt, warns against viewing research on replication as settled. “I caution that this study should not be regarded as some last word on reproducibility,” she said in call with the press. “It’s rather I think a beginning.”

Every aspect of the replication effort—methods, materials, data, and analyses—is shared publicly on the Open Science Framework, a platform designed to promote more open scientific collaboration. To view the table summarizing the replication results click here, for the project’s home page with links to all replication reports click here.