Opinion

The scandals rocking cancer science matter to your health

by F.D. Flam

The field of cancer biology is a mess. Signs of trouble emerged years before the most recent scandal, in which investigators found evidence of data manipulation in a slew of high-profile papers from the Harvard-affiliated Dana Farber Cancer Institute

It’s the latest crisis in academic research, where there’s a clear need for better quality control — a tighter filter than peer review. Some researchers suggest that AI could help point out which papers need closer scrutiny.

But to understand what’s going on, we have to understand how we got here. A decade ago, some research watchdogs started raising alarms after scientists found fewer than half of “landmark” pre-clinical cancer studies — those in top journals — could be replicated.

In 2021, a similar evaluation found that hype is the norm. Researchers found they could only reproduce 50 out of 193 experiments. And in those that did replicate, the second try showed much smaller effect sizes — only 15% as big as what had originally been claimed.

These are the kinds of experiments in test tubes or in mice that determine which treatments get tested in people. They also influence how trial subjects are informed about risks and benefits. So the results affect the lives of real people.

While evidence of data tampering — what the Dana Farber scientists are accused of — is a different problem from irreproducible results, both stem from the same root causes. Scientists gain fame and fortune by obtaining flashy, potentially high-impact findings, but people benefit from findings that are solid and reproducible. We also benefit from findings that show which treatments are unlikely to work, though these are hard to get published.

As Nobel winner William Kaelin warned me back in 2017, biomedical researchers have started making bigger claims with flimsier evidence. (He’s also at Dana Farber, but his work hasn’t been named in this current scandal.)

Scientists are allowed to make mistakes, of course. But they are supposed to present their data exactly as they measured it. Any graphs are supposed to represent that data as measured. Adding, subtracting or changing data without explanation is usually considered an act of fraud.

While the case is still being investigated, Dana Farber plans to retract six papers and issue corrections in many more. It’s possible that the problems in some of the papers might have been accidental, but there are an awful lot of them — and such errors would still cast doubt on the findings.

Data manipulation is all too common, said Ivan Oransky, co-founder of the blog Retraction Watch. “The part that worries me is we’re going to continue treating this like this weird anomaly, which it isn’t.”

A study that doesn’t replicate, on the other hand, might have been done according to all the rules, but the conclusions aren’t ones you’d want to bet the lives of cancer patients on. The researchers might have misinterpreted their data or the experiment might work only under very specific conditions.

So why hasn’t peer review prevented the publication of weak results and outright fraud? For one, many papers don’t include their raw data, making fraud hard to spot.

But at a deeper level, peer review isn’t the quality control measure many people assume. Some historians trace peer review back to 1830, when English philosopher William Whewell proposed it for papers to be published in a new journal, the Proceedings of the Royal Society of London. In the first attempt, Whewell himself took on the job but couldn’t agree with a second reviewer, thus ushering in a long tradition bemoaned by scientists the world over.

Reviewers often have the expertise to evaluate 90% or 95% of a paper, said Brian Uzzi, a social scientist who studies problems with replication at the Kellogg School of Management at Northwestern University. “You’ll leave that last 5% hoping that the other reviewer is going to pick up on it. But maybe the other reviewer is doing the same thing,” he said. Reviewers are also often pressed for time, overwhelmed by other review requests and their own research obligations.

Uzzi found that in social science, where there’s been a longstanding reproducibility crisis, machine learning can flag the papers most likely to fail attempts at replication. He used data on hundreds of attempted replications to train a system that he then tested on 300 new experiments for which he had replication data. The machine learning system was more accurate than individual human reviewers, as well as inexpensive and almost instantaneous.

Perhaps such systems could help human experts do more to flag sloppy and dishonest work by taking a first pass. It could also help direct overworked reviewers and journal editors away from the famous scientists and institutions who tend to get the most attention and toward important findings by lesser-known teams.

Scientists already create a flood of new research papers, so it wouldn’t hurt to add a new layer of quality control and put more time and money into separating good papers from bad. Otherwise, we will be paying for all that bad research — not only with our tax dollars, but with our health.

F.D. Flam is a Bloomberg Opinion columnist covering science. She is host of the “Follow the Science” podcast.