This article is published in the June 2019 issue.

NAS Report on Reproducibility and Repeatability in Science

National Academies logoSignificant contributions to this post were provided by Computing Community Consortium (CCC) council member Juliana Freire from NYU.

When a new exciting discovery is announced in our field, can we trust it? How was it produced? What data and code was used? How accurate are the results? Can they be reproduced?

Recently, Congress1 directed the National Science Foundation (NSF) to contract with the National Academies of Sciences, Engineering, and Medicine (NAS) to “undertake a study to assess reproducibility and replicability in scientific and engineering research and to provide findings and recommendations for improving rigor and transparency in research.”

An interdisciplinary committee of fifteen members, including CCC Council Member Juliana Freire, came together to “define what it means to reproduce or replicate a study across different scientific fields, to explore issues related to reproducibility and replicability across science and engineering, and to assess any impact of these issues on the public’s trust in science.” They produced the newly released NAS Reproducibility and Replicability in Science (2019) report.

A key takeaway from the report is that there is no crisis, but we cannot be complacent either. Reproducibility and replicability are important to attain confidence in scientific knowledge, but they are not the end goal of science. The fact that a given result cannot be reproduced (or replicated) does not mean it is incorrect, and conversely, replication and reproducibility do not imply correctness. Multiple channels of evidence from a variety of studies provide a robust means for gaining confidence in scientific knowledge over time. In fact, the inability to replicate a study is part of the self-correcting nature of science – it can signal a problem and it can also lead to new discoveries. At the same time, there is room for improvement. The report provides a series of recommendations to scientists, funding agencies, and publishers that aim to increase the adoption of reproducibility.

Various scientific disciplines define and use the terms “reproducibility” and “replicability” in different and sometimes contradictory ways. After considering the state of current usage, the committee adopted definitions that are intended to apply across all fields of science and help untangle the complex issues associated with reproducibility and replicability. Their definitions are as follows:

  • Reproducibility- obtaining consistent computational results using the same input data, computational steps, methods, and code, and conditions of analysis
  • Replicability- obtaining consistent results across studies aimed at answering the same scientific question, each of which has obtained its own data.
  • Generalizability – refers to the extent that results of a study apply in other contexts or populations that differ from the original one.

In short, reproducibility involves the original data and code; replicability involves new data collection and similar methods used by previous studies. Note that the definition of reproducibility focuses on computation in recognition of its large and increasing role in scientific research.

The authors have some recommendations for funding agencies:

Funding agencies and organizations should consider investing in research and development of open-source, usable tools and infrastructure that support reproducibility for a broad range of studies across different domains in a seamless fashion. Concurrently, investments would be helpful in outreach to inform and train researchers on best practices and how to use these tools.

In addition, they have specific recommendations for the NSF:

The NSF should, in harmony with other funders, endorse or create code and data repositories for long-term preservation of digital artifacts. In line with its expressed goal of “harnessing the data revolution,” NSF should consider funding tools, training, and activities to promote computational reproducibility.

They include a set of criteria in the report to help determine when testing replicability may be necessary. As “it is important for everyone involved in science to endeavor to maintain public trust in science based on a proper understanding of the contributions and limitations of scientific results.”

Finally, they end with an important warning “a predominant focus on the replicability of individual studies is an inefficient way to assure the reliability of scientific knowledge. Rather, reviews of cumulative evidence on a subject, to assess both the overall effect size and generalizability, is often a more useful way to gain confidence in the state of scientific knowledge.”

If you always expect your results to be identical you will not only be severely disappointed but could possibly miss an even greater discovery.

1This was done in response to Public Law 114-329, which cites “growing concern that some published research findings cannot be reproduced or replicated.”