Reproducibility Measure

A Reproducibility Measure is a similarity measure in agreement between results of measurements of the same measurand carried out by the same operating conditions over a time period, or by different observers or under changed conditions of measurement.

Context(s):
- It can be associated with Reproducible Data,
- It can be associated with Reproducible Research,
- It can be a requirement of Scientific Evidence.
Example(s):
Counter-Example(s):
- Accuracy,
- Irreproducibility,
- Repeatability,
- Uncertainty
See: Scientific Evidence, Metascience, Measurement, Measurand, Scientific Method.

References

2019a

(Wikipedia, 2019) ⇒ https://en.wikipedia.org/wiki/Reproducibility Retrieved:2019-10-19.
- Reproducibility is the closeness of the agreement between the results of measurements of the same measurand carried out with same methodology described in the corresponding scientific evidence (e.g. a publication in a peer-reviewed journal). Reproducibility can also be applied under changed conditions of measurement for the same measurand - to check that the results are not an artefact of the measurement procedures [1] [2]. A related concept is replication, which is the ability to independently achieve non-identical conclusions that are at least similar, when differences in sampling, research procedures and data analysis methods may exist. Reproducibility and replicability together are among the main tools of "the scientific method" — with the concrete expressions of the ideal of such a method varying considerably across research disciplines and fields of study. The study of reproducibility is an important topic in metascience.

2019b

(Wiktionary, 2019) ⇒ https://en.wiktionary.org/wiki/reproducibility Retrieved:2019-10-19.
- QUOTE: 1. The quality of being reproducible.
  2. The closeness of agreement among repeated measurements of a variable made under the same operating conditions over a period of time, or by different people.

2018

(Plesser, 2018) ⇒ Hans E. Plesser. (2018). “Reproducibility Vs. Replicability: A Brief History of a Confused Terminology.” In: Frontiers in neuroinformatics Journal, 11(76). doi:10.3389/fninf.2017.00076
- QUOTE: Together with some colleagues, I proposed similar definitions some years ago (Crook et al., 2013). The different terminologies are summarized in Table 1.

**Table 1:** Table 1. Comparison of terminologies. See text for details.
Goodman	Claerbout	ACM
		Repeatability
Methods Reproducibility	Reproducibility	Replicability
Results Reproducibility	Replicability	Reproducibility
Inferential Reproducibility

2016a

(ACM, 2016) ⇒ Association for Computing Machinery (2016). Artifact Review and Badging. Available online at: https://www.acm.org/publications/policies/artifact-review-badging Retrieved: 2019-10-19.
- QUOTE: A variety of research communities have embraced the goal of reproducibility in experimental science. Unfortunately, the terminology in use has not been uniform. Because of this we find it necessary to define our terms. The following are inspired by the International Vocabulary for Metrology(VIM); see the Appendix] for details.
  - Repeatability (Same team, same experimental setup)
    The measurement can be obtained with stated precision by the same [[Research Team|team using the same measurement procedure, the same measuring system, under the same operating conditions, in the same location on multiple trials. For computational experiments, this means that a researcher can reliably repeat her own computation.
  - Replicability (Different team, same experimental setup)
    The measurement can be obtained with stated precision by a different team using the same measurement procedure, the same measuring system, under the same operating conditions, in the same or a different location on multiple trials. For computational experiments, this means that an independent group can obtain the same result using the author’s own artifacts.
  - Reproducibility (Different team, different experimental setup)
    The measurement can be obtained with stated precision by a different team, a different measuring system, in a different location on multiple trials. For computational experiments, this means that an independent group can obtain the same result using artifacts which they develop completely independently.

The concepts of repeatability and reproducibility are taken directly from the VIM. Repeatability is something we expect of any well-controlled experiment. Results that are not repeatable are rarely suitable for publication. The proposed intermediate concept of replicability stems from the unique properties of computational experiments, i.e., that the measurement procedure/system, being virtual, is more easily portable, enabling inspection and exercise by others. While reproducibility is the ultimate goal, this initiative seeks to take an intermediate step, that is, to promote practices that lead to better replicability. We fully acknowledge that simple replication of results using author-supplied artifacts is a weak form of reproducibility. Nevertheless, it is an important first step, and the auditing processes that go well beyond traditional refereeing will begin to raise the bar for experimental research in computing.

2016b

(Goodman et al., 2016) ⇒ Steven N. Goodman, Daniele Fanelli, and John P. A. Ioannidis. (2016). “What Does Research Reproducibility Mean?.” In: Science Translational Medicine Journal, 8(341). doi:10.1126/scitranslmed.aaf5027.
- QUOTE: Methods reproducibility is meant to capture the original meaning of reproducibility, that is, the ability to implement, as exactly as possible, the experimental and computational procedures, with the same data and tools, to obtain the same results. Results reproducibility refers to what was previously described as “replication,” that is, the production of corroborating results in a new study, having followed the same experimental methods. Inferential reproducibility, not often recognized as a separate concept, is the making of knowledge claims of similar strength from a study replication or reanalysis. This is not identical to results reproducibility, because not all investigators will draw the same conclusions from the same results, or they might make different analytical choices that lead to different inferences from the same data. Here, we explore the definitions and operational complexities of each of these concepts ...

2015

(Keyrouz & Mascagni, 2015) ⇒ Walid Keyrouz, and Michael V. Mascagni (2015). Scientific Software Sustainability: The Numerical Reproducibility Challenge (No. Computational Science & Engineering Software Sustainability and Productivity Challenges (CSESSP Challenges)).
- QUOTE: Experimental reproducibility is a cornerstone of the scientific method. The ease of achieving its counterpart in computing, numerical reproducibility, was one of the core assumptions underpinning the growth of scientific computing over the past several decades to become a powerful tool for scientific inquiry that is now widely considered as the third leg of science. The other core assumption was the deterministic behavior of computer hardware. Unfortunately, these assumptions are currently being challenged by hardware developments over the past several decades as discussed and documented in recent reports and workshops. In this position paper, we are advocating for the following actions: - Redefine numeric reproducibility by considering numeric results as computational measurements and treat them as the equivalent of physical measurements (...)

2008

(JCGM, 2008) ⇒ JCGM Working Group 1 (2008).. JCGM 100:2008 Evaluation of measurement data – Guide to the expression of uncertainty in measurement (PDF), Joint Committee for Guides in Metrology, 2008. GUM 1995 with minor corrections.
- QUOTE: reproducibility (of results of measurements) - closeness of the agreement between the results of measurements of the same measurand carried out under changed conditions of measurement.
  - NOTE 1: A valid statement of reproducibility requires specification of the conditions changed.
  - NOTE 2: The changed conditions may include:
    - principle of measurement;
    - method of measurement;
    - observer;
    - measuring instrument;
    - reference standard;
    - location;
    - conditions of use
    - time.
  - NOTE 3: Reproducibility may be expressed quantitatively in terms of the dispersion characteristics of the results.
  - NOTE 4: Results are here usually understood to be corrected results.

1994

(Taylor & Kuyatt, 1994) ⇒ Barry N. Taylor, and Chris E. Kuyatt (1994). NIST Guidelines for Evaluating and Expressing the Uncertainty of NIST Measurement Results Cover, Gaithersburg, MD, USA: National Institute of Standards and Technology.