The Replicability Crisis in Science

English: Pre-war Bayer heroin bottle, original...
English: Pre-war Bayer heroin bottle, originally containing 5 grams of Heroin substance. (Photo credit: Wikipedia)
The world of science is in the midst of unprecedented soul-searching at present. The credibility of science rests on the widespread assumption that results are replicable, and that high standards are maintained by anonymous peer review. These pillars of belief are crumbling. In September 2015, the international scientific journal Nature published a cartoon showing the temple of “Robust Science” in a state of collapse. What is going on?

Drug companies sounded an alarm several years ago. They were concerned that an increasing proportion of clinical trials was failing, and that much of their research effort was being wasted. When they looked into the reasons for their lack for success, they realized that they were basing projects on scientific papers published in peer-reviewed journals, on the assumption that most of the results were reliable. But when they looked more closely, they found that most of these papers, even those in top-tier academic journals, were not reproducible. In 2011, German researchers in the drug company Bayer found in an extensive survey that more than 75% of the published findings could not be validated.

In 2012, scientists at the American drug company Amgen published the results of a study in which they selected 53 key papers deemed to be “landmark” studies and tried to reproduce them. Only 6 (11%) could be confirmed.

In 2012, the governments of the world’s richer countries spent $59 billion on biomedical research, one justification for which is that basic-science research provides the foundations for work by private drug companies. So this is not a trivial problem. Meanwhile, by 2013, in the realm of experimental psychology, as in other branches of science, there were alarming signs that much of the published research could not be replicated. A large-scale replication study by psychologists published last year sent further shock waves through the scientific world when it turned out that around two-thirds of the published studies in top psychology journal were not reproducible.
In the late nineteenth century, many scientists adopted a style of writing using the passive voice, “A test tube was taken….” instead of “I took a test tube…” to create as impersonal a style as possible, a world of emotion-free events unfolding spontaneously in front of a detached objective observer.
In reality, of course, scientists are people, and like other people have different temperaments and personalities from each other, are often competitive, and prefer their own hypothesis to be right rather than wrong. In most branches of science, scientists publish only a small percentage of their data, 10% or less, and obviously select the “best” results to publish, leaving inconvenient or inconclusive data unpublished. The problem is made worse by a systematic bias against replications within the sciences. 

Researchers who replicate other people’s work find it hard, if not impossible, to get their papers published, because replication is not deemed to be original, and most journals pride themselves on publishing original research.

Unfortunately, personal advancement in the world of science depends on incentives that encourage these questionable research practices. Professional scientists’ career prospects, promotions and grants depend on the number of papers they have published, the number of times they are cited and the prestige of the journals in which they are published. There are therefore powerful incentives for people to publish eye-catching papers with striking positive results. If other researchers cannot replicate the results, this may not be discovered for years, if it is discovered at all, and meanwhile their careers have advanced and the system perpetuates itself. In the world of business, the criteria for success depend on running a successful business, not on whether business plans are ranked highly by business academics, and whether they are often cited in business journals. But status in the world of science depends on publications in scientific journals, rather than on practical effects in the real world.

Meanwhile, the peer-review system is falling into disrepute. The very fact that so many unreliable papers are published shows that the system is not working effectively, and a recent investigation by the American journal Science revealed some shocking results. A member ofScience’s staff wrote a spoof paper, riddled with scientific and statistical errors, and sent 304 versions of it to a range of peer-reviewed journals. It was accepted for publication by more than half of them.

Obviously the present system of academic research encourages the publication of false positive results. At the same time, the huge financial incentives that underlie the multi-billion dollar drug industry encourage the suppression of negative results. Many drug companies simply do not publish the results of negative studies that show their drugs are ineffective. On the other hand, of course they publish the results of positive studies that favour their drugs. Insofar as “evidence-based medicine” relies on published studies, it creates a very misleading impression of scientific objectivity, reflecting a strong bias based on the commercial self-interest of pharmaceutical corporations.. Such practices are all too common, as Ben Goldacre shows in his book Bad Pharma: How Drug Companies Mislead Doctors and Harm Patients (2012).

The psychologist Nicholas Humphrey has compared this “sub-prime science” crisis to the financial crisis of 2008. The implications of this crisis are far-reaching, because science is so important for our civilisation and economy. There is now an unprecedented mood of humility within the sciences. Whether there will be serious changes, or simply a reversion to business-as-usual, remains to be seen.

Rupert Sheldrake London, January 14, 2016




For years now, researchers have been warning about a reproducibility crisis in science - the realisation that a lot of seminal papers, particularly in psychology, don't actually hold up when scientists take the time to try to reproduce the results.

Now, two more key papers in the psychology have failed the reproducibility test, serving as an important reminder that many of the scientific 'facts' we've come to believe aren't necessarily as steadfast as we thought.

To be fair, just because findings can't be reproduced, it doesn't automatically mean they're wrong. Replication is an important part of the scientific method that helps us nut out what's really going on - it could be that the new researchers did something differently, or that the trend is more subtle than originally thought.

But the problem is that, for decades now, the importance of replicating results has been largely overlooked, with researchers usually choosing to chase a 'new' discovery rather than fact-checking an old one - thanks to the pressure to publish exciting and novel findings in order to secure jobs.

As John Oliver said earlier this year: "There's no Nobel prize for fact-checking."

That's brought us to the 'crisis' we're in now, where most papers that are published can't be replicated. Last year, the University of Virginia led a new Reproducibility Project that repeated 100 experiments... with only one-third of them successfully being replicated - although this study has since been criticised for havings its own replication errors.

The two latest examples are widely cited papers from 1988 and 1998.

The 1988 study concluded that our facial expressions can influence our mood - so the more we smile, the happier we'll be, and vice versa.

The 1998 study, led by Roy Baumestier from Case Western University, provided evidence for something called ego depletion, which is the idea that our willpower can be worn down over time.

The latter assumption has been the basis of a huge amount of follow-on psychological studies, but now Martin Hagger from Curtin University in Australia has led researchers from 24 labs in an attempt to recreate the seminal paper, and found no evidence that the effect exists.

The facial expression replication attempt follows much the same trend.

In the original paper, researchers from Germany asked participants to read The Far Side comics by artist Gary Larson, with either a pen held between their teeth (forcing them into a smile) or between their lips (replicating a pout).

The team found that people who smiled found the comics funnier than those who were pouting, leading the researchers to conclude that changing our facial expression can change our moods, something known as the facial feedback hypothesis.

But when a team of researchers at the University of Amsterdam in the Netherlands conducted the same experiment - even using the same '80s comics - they failed to replicate the findings "in a statistically compelling fashion".

"Overall, the results were inconsistent with the original result," the team conclude in Perspectives in Psychological Science - a separate paper to the ego depletion replication, but also due to be published in a few weeks.

Again, that doesn't necessarily mean that the original result wasn't accurate - nine out of the 17 Dutch labs that attempted to recreate the experiment actually reported a similar result to the 1988 study. But the remaining eight labs didn't, and when the results were combined, the effect disappeared.  

"[T]his does not mean the entire facial feedback hypothesis is dead in the water," writes Christian Jarrett for the British Psychological Society's Research Digest.

"Many diverse studies have supported the hypothesis, including research involving participants who have undergone botox treatment, which affects their facial muscles."

The results could be due to a number of other variables - like, maybe people today don't find The Far Side funny anymore. And the Dutch study also used psychology students, many of whom would have been familiar with the 1988 paper, which could have skewed the results.

Only more investigation will help us know for sure.

But in the meantime, all this hype over the reproducibility crisis in the media lately can only be a good thing for the state of science. 

"It shows how much effort and attention has gone towards improving the accuracy of the knowledge produced," John Ioannidis, a Stanford University researcher who led a 2005 reproducibility study, told Olivia Goldhill at Quartz.

"Psychology is a discipline that has always been very strong methodologically and was at the forefront at describing various biases and better methods. Now they are again taking the lead in improving their replication record."

One positive that's already emerged is a discussion about pre-registering trials, which would stop researchers tweaking their results after they've been collected to get a more exciting results.

And hopefully, the more people talk and think about replicating results, the better the public will get at thinking critically about the science news they read.

"Science isn’t about truth and falsity, it’s about reducing uncertainty," Brian Nosek, the researcher behind the Reproducibility Project, told Quartz.

"Really, this whole project is science on science: researchers doing what science is supposed to do, which is be skeptical of our own process, procedure, methods, and look for ways to improve." 

Comments (0)

Post a Comment

af49e98cd69973df67823e63334f8d19eed86bd0f18fd18bb1