Half the Cancer Studies Don't Repeat

Preclinical Cancer Biology Findings Replicate at Less Than Half the Reported Effect Size

health

Problem Statement

The 8-year Reproducibility Project: Cancer Biology found that only 46% of 112 experimental effects from high-impact cancer papers met basic replication criteria, and the magnitude of replicated effects was on average only 15% of the originally reported size. Original positive results were half as likely to replicate (40%) compared to null results (80%). The project attempted to replicate 193 experiments from 53 papers but could complete only 50 experiments from 23 papers due to missing protocols, unavailable reagents, and unresponsive authors. Data needed to compute effect sizes was publicly accessible for just 4 of 193 experiments.

Why This Matters

Preclinical cancer biology findings directly inform clinical trial design — they determine which drug targets are pursued, which biomarkers are measured, and which patient populations are enrolled. When preclinical effects are 85% smaller than reported, clinical trials designed around those effect sizes are systematically underpowered. The estimated cost of irreproducible preclinical research in the US is $28 billion annually. Failed clinical trials that never should have been initiated waste resources, delay effective treatments, and erode public trust in biomedical research.

What’s Been Tried

Publication in high-impact journals, peer review, and citation counts were treated as proxies for result validity. The Reproducibility Project revealed these signals are uncorrelated with replicability. Even when replication was attempted, most original papers lacked the methodological detail needed: key descriptive and inferential statistics were missing, protocols were incompletely documented, and requests for data sharing were ignored 68% of the time. Registered reports (pre-registering experimental protocols) have been adopted by some journals but remain a small fraction of cancer biology publications. The fundamental barrier is not fraud — it's that the information infrastructure needed to evaluate or reproduce a finding is systematically not captured or shared.

What Would Unlock Progress

Mandatory registered reports for preclinical studies with therapeutic implications — pre-registering protocols and analysis plans before conducting experiments. Structured data-sharing requirements where raw data, analysis code, and complete protocols are deposited at submission (analogous to GenBank for sequences). Effect size reporting and power analysis as publication requirements. Critically, replication studies must become publishable and career-valued rather than treated as derivative work. The Registered Reports model adopted by eLife and PLOS provides a template, but adoption across the field requires institutional incentive reform.

Entry Points for Student Teams

A team could select 5–10 recent high-impact cancer biology papers and assess their replicability using only published information: Can the experiment be fully reconstructed from the methods section? Are key materials (cell lines, antibodies, reagents) identifiable and obtainable? Is the statistical analysis reproducible from reported data? Publishing this "replicability audit" would contribute to the evidence base on research infrastructure gaps. Biostatistics, cancer biology, and science policy skills would be most relevant.

Genome Tags

Constraint

databehavioral

Domain

health

Scale

global

Failure

unrepresentative-datanot-attempted

Breakthrough

institutional-integrationdata-integration

Stakeholders

systemic

Temporal

worsening

Tractability

research-contribution

Source Notes

This brief covers the systemic replication infrastructure problem, not any specific cancer biology question. It is structurally related to `health-research-antibody-validation-crisis` and `health-cell-line-misidentification-persistence` — all three represent failures in the foundational research infrastructure (reagents, models, methodology) rather than failures in specific disease biology. The Reproducibility Project data (openly available via eLife) provides unusually strong quantitative evidence of the problem's magnitude.