Reproducibility Measure: Difference between revisions

Latest revision as of 04:37, 24 June 2024

A Reproducibility Measure is a similarity measure in agreement between results of measurements of the same measurand carried out by the same operating conditions over a time period, or by different observers or under changed conditions of measurement.

Context(s):
- It can be associated with Reproducible Data,
- It can be associated with Reproducible Research,
- It can be a requirement of Scientific Evidence.
Example(s):
Counter-Example(s):
- Accuracy,
- Irreproducibility,
- Repeatability,
- Uncertainty
See: Scientific Evidence, Metascience, Measurement, Measurand, Scientific Method.

References

2019a

(Wikipedia, 2019) ⇒ https://en.wikipedia.org/wiki/Reproducibility Retrieved:2019-10-19.
- Reproducibility is the closeness of the agreement between the results of measurements of the same measurand carried out with same methodology described in the corresponding scientific evidence (e.g. a publication in a peer-reviewed journal). Reproducibility can also be applied under changed conditions of measurement for the same measurand - to check that the results are not an artefact of the measurement procedures [1] [2]. A related concept is replication, which is the ability to independently achieve non-identical conclusions that are at least similar, when differences in sampling, research procedures and data analysis methods may exist. Reproducibility and replicability together are among the main tools of "the scientific method" — with the concrete expressions of the ideal of such a method varying considerably across research disciplines and fields of study. The study of reproducibility is an important topic in metascience.

2019b

(Wiktionary, 2019) ⇒ https://en.wiktionary.org/wiki/reproducibility Retrieved:2019-10-19.
- QUOTE: 1. The quality of being reproducible.
  2. The closeness of agreement among repeated measurements of a variable made under the same operating conditions over a period of time, or by different people.

2018

(Plesser, 2018) ⇒ Hans E. Plesser. (2018). “Reproducibility Vs. Replicability: A Brief History of a Confused Terminology.” In: Frontiers in neuroinformatics Journal, 11(76). doi:10.3389/fninf.2017.00076
- QUOTE: Together with some colleagues, I proposed similar definitions some years ago (Crook et al., 2013). The different terminologies are summarized in Table 1.

**Table 1:** Table 1. Comparison of terminologies. See text for details.
Goodman	Claerbout	ACM
		Repeatability
Methods Reproducibility	Reproducibility	Replicability
Results Reproducibility	Replicability	Reproducibility
Inferential Reproducibility

2016a

(ACM, 2016) ⇒ Association for Computing Machinery (2016). Artifact Review and Badging. Available online at: https://www.acm.org/publications/policies/artifact-review-badging Retrieved: 2019-10-19.
- QUOTE: A variety of research communities have embraced the goal of reproducibility in experimental science. Unfortunately, the terminology in use has not been uniform. Because of this we find it necessary to define our terms. The following are inspired by the International Vocabulary for Metrology(VIM); see the Appendix] for details.
  - Repeatability (Same team, same experimental setup)
    The measurement can be obtained with stated precision by the same [[Research Team|team using the same measurement procedure, the same measuring system, under the same operating conditions, in the same location on multiple trials. For computational experiments, this means that a researcher can reliably repeat her own computation.
  - Replicability (Different team, same experimental setup)
    The measurement can be obtained with stated precision by a different team using the same measurement procedure, the same measuring system, under the same operating conditions, in the same or a different location on multiple trials. For computational experiments, this means that an independent group can obtain the same result using the author’s own artifacts.
  - Reproducibility (Different team, different experimental setup)
    The measurement can be obtained with stated precision by a different team, a different measuring system, in a different location on multiple trials. For computational experiments, this means that an independent group can obtain the same result using artifacts which they develop completely independently.

The concepts of repeatability and reproducibility are taken directly from the VIM. Repeatability is something we expect of any well-controlled experiment. Results that are not repeatable are rarely suitable for publication. The proposed intermediate concept of replicability stems from the unique properties of computational experiments, i.e., that the measurement procedure/system, being virtual, is more easily portable, enabling inspection and exercise by others. While reproducibility is the ultimate goal, this initiative seeks to take an intermediate step, that is, to promote practices that lead to better replicability. We fully acknowledge that simple replication of results using author-supplied artifacts is a weak form of reproducibility. Nevertheless, it is an important first step, and the auditing processes that go well beyond traditional refereeing will begin to raise the bar for experimental research in computing.

2016b

(Goodman et al., 2016) ⇒ Steven N. Goodman, Daniele Fanelli, and John P. A. Ioannidis. (2016). “What Does Research Reproducibility Mean?.” In: Science Translational Medicine Journal, 8(341). doi:10.1126/scitranslmed.aaf5027.
- QUOTE: Methods reproducibility is meant to capture the original meaning of reproducibility, that is, the ability to implement, as exactly as possible, the experimental and computational procedures, with the same data and tools, to obtain the same results. Results reproducibility refers to what was previously described as “replication,” that is, the production of corroborating results in a new study, having followed the same experimental methods. Inferential reproducibility, not often recognized as a separate concept, is the making of knowledge claims of similar strength from a study replication or reanalysis. This is not identical to results reproducibility, because not all investigators will draw the same conclusions from the same results, or they might make different analytical choices that lead to different inferences from the same data. Here, we explore the definitions and operational complexities of each of these concepts ...

2015

(Keyrouz & Mascagni, 2015) ⇒ Walid Keyrouz, and Michael V. Mascagni (2015). Scientific Software Sustainability: The Numerical Reproducibility Challenge (No. Computational Science & Engineering Software Sustainability and Productivity Challenges (CSESSP Challenges)).
- QUOTE: Experimental reproducibility is a cornerstone of the scientific method. The ease of achieving its counterpart in computing, numerical reproducibility, was one of the core assumptions underpinning the growth of scientific computing over the past several decades to become a powerful tool for scientific inquiry that is now widely considered as the third leg of science. The other core assumption was the deterministic behavior of computer hardware. Unfortunately, these assumptions are currently being challenged by hardware developments over the past several decades as discussed and documented in recent reports and workshops. In this position paper, we are advocating for the following actions: - Redefine numeric reproducibility by considering numeric results as computational measurements and treat them as the equivalent of physical measurements (...)

2008

(JCGM, 2008) ⇒ JCGM Working Group 1 (2008).. JCGM 100:2008 Evaluation of measurement data – Guide to the expression of uncertainty in measurement (PDF), Joint Committee for Guides in Metrology, 2008. GUM 1995 with minor corrections.
- QUOTE: reproducibility (of results of measurements) - closeness of the agreement between the results of measurements of the same measurand carried out under changed conditions of measurement.
  - NOTE 1: A valid statement of reproducibility requires specification of the conditions changed.
  - NOTE 2: The changed conditions may include:
    - principle of measurement;
    - method of measurement;
    - observer;
    - measuring instrument;
    - reference standard;
    - location;
    - conditions of use
    - time.
  - NOTE 3: Reproducibility may be expressed quantitatively in terms of the dispersion characteristics of the results.
  - NOTE 4: Results are here usually understood to be corrected results.

1994

(Taylor & Kuyatt, 1994) ⇒ Barry N. Taylor, and Chris E. Kuyatt (1994). NIST Guidelines for Evaluating and Expressing the Uncertainty of NIST Measurement Results Cover, Gaithersburg, MD, USA: National Institute of Standards and Technology.

@@ Line 1: / Line 1: @@
 A [[Reproducibility Measure]] is a [[similarity measure]] in agreement between results of [[measurement]]s of the same [[measurand]] carried out by the same [[operating condition]]s over a [[time period]], or by different [[observer]]s or under changed [[condition]]s of [[measurement]].
-* <B> Context(s):</B>
+* <B>Context(s):</B>
-** It is the [[quality]] of [[Reproducible Data]],
+** It can be associated with [[Reproducible Data]],
-** It is the [[quality]] of a [[Reproducible Research]],
+** It can be associated with [[Reproducible Research]],
-** It is the [[property]] of [[Scientific Evidence]].
+** It can be a requirement of [[Scientific Evidence]].
 * <B>Example(s):</B>
 ** [[Inferential Reproducibility]],
@@ Line 15: / Line 15: @@
 ** [[Uncertainty]]
 * <B>See:</B> [[Scientific Evidence]], [[Metascience]], [[Measurement]], [[Measurand]], [[Scientific Method]].
 ----
 ----
@@ Line 22: / Line 23: @@
 === 2019a ===
 * (Wikipedia, 2019) ⇒ https://en.wikipedia.org/wiki/Reproducibility Retrieved:2019-10-19.
-** '''Reproducibility''' is the closeness of the agreement between the results of [[measurements]] of the same measurand carried out with same methodology described in the corresponding scientific evidence (e.g. a publication in a peer-reviewed journal). Reproducibility can also be applied under changed conditions of measurement for the same measurand - to check that the results are not an artefact of the measurement procedures &#91;[[#2008|1]]&#93; &#91;[[#1994|2]]&#93;. A related concept is replication, which is the ability to independently achieve non-identical conclusions that are at least similar, when differences in sampling, research procedures and data analysis methods may exist.  Reproducibility and replicability together are among the main tools of [[scientific method | "the scientific method"]]  — with the concrete expressions of the ideal of such a method varying considerably across research disciplines and fields of study.The study of reproducibility is an important topic in [[metascience]].
+** '''Reproducibility''' is the closeness of the agreement between the results of [[measurement]]s of the same measurand carried out with same methodology described in the corresponding scientific evidence (e.g. a publication in a peer-reviewed journal). Reproducibility can also be applied under changed conditions of measurement for the same measurand - to check that the results are not an artefact of the measurement procedures &#91;[[#2008|1]]&#93; &#91;[[#1994|2]]&#93;. A related concept is replication, which is the ability to independently achieve non-identical conclusions that are at least similar, when differences in sampling, research procedures and data analysis methods may exist.  Reproducibility and replicability together are among the main tools of [[scientific method | "the scientific method"]] — with the concrete expressions of the ideal of such a method varying considerably across research disciplines and fields of study. The study of reproducibility is an important topic in [[metascience]].
 === 2019b ===
 * (Wiktionary, 2019) ⇒ https://en.wiktionary.org/wiki/reproducibility Retrieved:2019-10-19.
-** QUOTE: 1. The [[quality]] of being [[reproducible]]. <P>2. The [[closeness of agreement]] among [[repeated measurement]]s of a [[variable]] made under the same [[operating condition]]s over a [[period of time]], or by different people.
+** QUOTE: 1. The [[quality]] of being [[Reproducibility Measure|reproducible]].         <P>          2. The [[closeness of agreement]] among [[repeated measurement]]s of a [[variable]] made under the same [[operating condition]]s over a [[period of time]], or by different people.
+=== 2018 ===
+* ([[2018_ReproducibilityVsReplicabilityA|Plesser, 2018]]) ⇒ [[Hans E. Plesser]]. ([[2018]]). &ldquo;[https://www.frontiersin.org/articles/10.3389/fninf.2017.00076/full Reproducibility Vs. Replicability: A Brief History of a Confused Terminology].&rdquo; In: Frontiers in neuroinformatics Journal, 11(76). [http://dx.doi.org/10.3389/fninf.2017.00076 doi:10.3389/fninf.2017.00076]
+** QUOTE: Together with some [[colleague]]s, I proposed similar [[definition]]s some years ago ([[2018_ReproducibilityVsReplicabilityA#Crook2013|Crook et al., 2013]]). The different [[terminologi]]es are summarized in [[#TAB1|Table 1]].         <P>         <P><div id="TAB1">
+{| class="wikitable" style="border: 1px; solid black; border-spacing: 1px;; margin: 1em auto; width: 900px;"
+|-
+!Goodman!!Claerbout!!ACM
+|-
+| || || [[Repeatability]]
+|-
+|[[Methods Reproducibility]]||[[Reproducibility Measure|Reproducibility]]||[[Replicability]]
+|-
+|[[Results Reproducibility]]|||[[Replicability]]|| [[Reproducibility Measure|Reproducibility]]
+|-
+|[[Inferential Reproducibility]]|| ||
+|+ align="bottom" style="caption-side: bottom; text-align: left; font-weight: normal;" |<P>'''Table 1:''' Table 1. Comparison of [[terminologi]]es. See text for details.
+|}</div>
+=== 2016a ===
+* (ACM, 2016) ⇒ Association for Computing Machinery (2016). Artifact Review and Badging. Available online at: https://www.acm.org/publications/policies/artifact-review-badging Retrieved: 2019-10-19.
+** QUOTE: A variety of [[research communiti]]es have embraced the goal of [[Reproducibility Measure|reproducibility]] in [[experimental science]]. Unfortunately, the [[terminology]] in use has not been [[uniform]]. Because of this we find it necessary to [[define]] our [[term]]s. The following are inspired by the [[International Vocabulary for Metrology(VIM)]]; see the [https://www.acm.org/publications/policies/artifact-review-badging#appendix Appendix]] for details.
+*** [[Repeatability]] (Same [[Research Team|team]], same [[experimental setup]])         <P>        The [[measurement]] can be obtained with stated [[precision]] by the same [[Research Team|team using the same [[measurement procedure]], the same [[measuring system]], under the same [[operating condition]]s, in the same [[location]] on multiple [[trial]]s. For [[computational experiment]]s, this means that a [[researcher]] can reliably [[repeat]] her own [[computation]].
+*** [[Replicability]] (Different team, same [[experimental setup]])<P>        The [[measurement]] can be obtained with stated [[precision]] by a different [[Research Team|team]] using the same [[measurement procedure]], the same [[measuring system]], under the same [[operating condition]]s, in the same or a different [[location]] on multiple [[trial]]s. For [[computational experiment]]s, this means that an [[independent group]] can obtain the same [[result]] using the author’s own [[artifact]]s.
+*** [[Reproducibility Measure|Reproducibility]] (Different [[Research Team|team]], different [[experimental setup]])         <P>          The [[measurement]] can be obtained with stated [[precision]] by a different [[Research Team|team]], a different [[measuring system]], in a different [[location]] on multiple [[trial]]s. For [[computational experiment]]s, this means that an [[independent group]] can obtain the same [[result]] using [[artifact]]s which they develop completely independently.
+:: The [[concept]]s of [[repeatability]] and [[Reproducibility Measure|reproducibility]] are taken directly from the [[International Vocabulary for Metrology|VIM]]. [[Repeatability]] is something we expect of any [[well-controlled experiment]]. [[Result]]s that are not [[repeatable]] are rarely suitable for [[publication]]. The proposed [[intermediate concept]] of [[replicability]] [[stem]]s from the unique properties of [[computational experiment]]s, i.e., that the [[measurement procedure]]/[[Measuring System|system]], being virtual, is more easily portable, enabling inspection and exercise by others. While [[Reproducibility Measure|reproducibility]] is the [[ultimate goal]], this initiative seeks to take an [[intermediate step]], that is, to promote practices that lead to better [[replicability]]. We fully acknowledge that simple [[replication]] of [[result]]s using [[author]]-supplied [[artifact]]s is a weak form of [[Reproducibility Measure|reproducibility]]. Nevertheless, it is an important first step, and the [[auditing process]]es that go well beyond traditional [[refereeing]] will begin to raise the bar for [[experimental research]] in [[computing]].
-=== 2016 ===
+=== 2016b ===
-* (Goodman et al., 2016) ⇒ [[Steven N. Goodman]], [[Daniele Fanelli]] and [[John P. A. Ioannidis]] (2016). [https://stm.sciencemag.org/content/8/341/341ps12 "What does research reproducibility mean?"]. Science translational medicine, 8(341). [https:doi.org/10.1126/scitranslmed.aaf5027 DOI: 10.1126/scitranslmed.aaf5027]
+* ([[2016_WhatDoesResearchReproducibility|Goodman et al., 2016]]) ⇒ [[Steven N. Goodman]], [[Daniele Fanelli]], and [[John P. A. Ioannidis]]. ([[2016]]). &ldquo;[https://stm.sciencemag.org/content/8/341/341ps12 What Does Research Reproducibility Mean?].&rdquo; In: [[Science Translational Medicine Journal]], 8(341). [http://dx.doi.org/10.1126/scitranslmed.aaf5027 doi:10.1126/scitranslmed.aaf5027].
-** QUOTE: [[Methods reproducibility]] is meant to capture the original meaning of [[reproducibility]], that is, the ability to implement, as exactly as possible, the [[experimental]] and [[computational procedure]]s, with the same [[data]] and [[tool]]s, to obtain the same [[result]]s. [[Results reproducibility]] refers to what was previously described as “[[replication]],” that is, the production of [[corroborating result]]s in a new study, having followed the same [[experimental method]]s. [[Inferential reproducibility]], not often recognized as a separate concept, is the making of [[knowledge claim]]s of similar strength from a study [[replication]] or [[reanalysis]]. This is not [[identical]] to [[results reproducibility]], because not all investigators will draw the same conclusions from the same results, or they might make different [[analytical choice]]s that lead to different [[inference]]s from the same [[data]]. Here, we explore the [[definition]]s and operational complexities of each of these concepts ...
+** [[2016_WhatDoesResearchReproducibility#A_New_Lexicon_For_Research_Reproducibility|QUOTE]]: [[Methods reproducibility]] is meant to capture the original meaning of [[Reproducibility Measure|reproducibility]], that is, the ability to implement, as exactly as possible, the [[experimental]] and [[computational procedure]]s, with the same [[data]] and [[tool]]s, to obtain the same [[result]]s. [[Results reproducibility]] refers to what was previously described as “[[replication]],” that is, the production of [[corroborating result]]s in a new study, having followed the same [[experimental method]]s. [[Inferential reproducibility]], not often recognized as a separate concept, is the making of [[knowledge claim]]s of similar strength from a study [[replication]] or [[reanalysis]]. This is not [[identical]] to [[results reproducibility]], because not all investigators will draw the same conclusions from the same results, or they might make different [[analytical choice]]s that lead to different [[inference]]s from the same [[data]]. Here, we explore the [[definition]]s and operational complexities of each of these concepts ...
 === 2015 ===
 * (Keyrouz & Mascagni, 2015) ⇒ [[Walid Keyrouz]], and [[Michael V. Mascagni]] (2015). [https://www.nist.gov/publications/scientific-software-sustainability-numerical-reproducibility-challenge Scientific Software Sustainability: The Numerical Reproducibility Challenge] (No. Computational Science & Engineering Software Sustainability and Productivity Challenges (CSESSP Challenges)).
-** QUOTE: [[Experimental reproducibility]] is a cornerstone of the [[scientific method]]. The ease of achieving its [[counterpart]] in [[computing]], [[numerical reproducibility]], was one of the core assumptions underpinning the growth of [[scientific computing]] over the past several [[decade]]s to become a powerful tool for [[scientific inquiry]] that is now widely considered as the ''third leg of science''. The other core assumption was the [[deterministic behavior]] of [[computer hardware]]. Unfortunately, these assumptions are currently being challenged by [[hardware development]]s over the past several decades as discussed and documented in recent reports and workshops. In this position paper, we are advocating for the following actions: - Redefine [[numeric reproducibility]] by considering [[numeric result]]s as [[computational measurement]]s and treat them as the equivalent of [[physical measurement]]s (...)
+** QUOTE: [[Reproducibility Measure|Experimental reproducibility]] is a cornerstone of the [[scientific method]]. The ease of achieving its [[counterpart]] in [[computing]], [[numerical reproducibility]], was one of the core assumptions underpinning the growth of [[scientific computing]] over the past several [[decade]]s to become a powerful tool for [[scientific inquiry]] that is now widely considered as the ''third leg of science''. The other core assumption was the [[deterministic behavior]] of [[computer hardware]]. Unfortunately, these assumptions are currently being challenged by [[hardware development]]s over the past several decades as discussed and documented in recent reports and workshops. In this position paper, we are advocating for the following actions: - Redefine [[numeric reproducibility]] by considering [[numeric result]]s as [[computational measurement]]s and treat them as the equivalent of [[physical measurement]]s (...)
 === 2008 ===
 * (JCGM, 2008) ⇒ JCGM Working Group 1  (2008).[http://www.bipm.org/utils/common/documents/jcgm/JCGM_100_2008_E.pdf . JCGM 100:2008  Evaluation of measurement data – Guide to the expression of uncertainty in measurement (PDF)], Joint Committee for Guides in Metrology, 2008. GUM 1995 with minor corrections.
-** QUOTE: '''[[Reproducibility|reproducibility (of results of measurements)]]''' - [[closeness]] of the agreement between the results of [[measurement]]s of the same [[measurand]] carried out under changed [[condition]]s of [[measurement]].
+** QUOTE: <B>[[Reproducibility Measure|reproducibility (of results of measurements)]]</B> - [[closeness]] of the agreement between the results of [[measurement]]s of the same [[measurand]] carried out under changed [[condition]]s of [[measurement]].
-*** NOTE 1: A [[valid statement]] of [[reproducibility]] requires specification of the [[condition]]s changed.
+*** NOTE 1: A [[valid statement]] of [[Reproducibility Measure|reproducibility]] requires specification of the [[condition]]s changed.
 *** NOTE 2: The changed [[condition]]s may include:
 **** [[principle of measurement]];
@@ Line 49: / Line 75: @@
 **** [[condition]]s of use
 **** [[time]].
-*** NOTE 3: [[Reproducibility]] may be expressed [[quantitatively]] in terms of the [[dispersion]] characteristics of the [[result]]s.
+*** NOTE 3: [[Reproducibility Measure|Reproducibility]] may be expressed [[quantitatively]] in terms of the [[dispersion]] characteristics of the [[result]]s.
 *** NOTE 4: Results are here usually understood to be corrected results.
 === 1994 ===
 *  (Taylor & Kuyatt, 1994) ⇒ [[Barry N. Taylor]], and [[Chris E. Kuyatt]] (1994). [https://www.nist.gov/pml/nist-technical-note-1297 NIST Guidelines for Evaluating and Expressing the Uncertainty of NIST Measurement Results Cover], Gaithersburg, MD, USA: National Institute of Standards and Technology.
 ----
+__NOTOC__
 [[Category:Concept]]
-__NOTOC__