Reproducibility Measure: Difference between revisions

(Created page with "A Reproducibility Measure is a similarity measure in agreement between results of measurements of the same measurand carried out by the same operating condit...")
 
m (Text replacement - "ments]]" to "ment]]s")
 
(35 intermediate revisions by 4 users not shown)
Line 1: Line 1:
A [[Reproducibility Measure]] is a [[similarity measure]] in agreement between results of [[measurement]]s of the same [[measurand]] carried out by the same [[operating condition]]s over a [[time period]], or by different [[observer]]s or under changed [[condition]]s of [[measurement]].
A [[Reproducibility Measure]] is a [[similarity measure]] in agreement between results of [[measurement]]s of the same [[measurand]] carried out by the same [[operating condition]]s over a [[time period]], or by different [[observer]]s or under changed [[condition]]s of [[measurement]].
* <B> Context(s):</B>
* <B>Context(s):</B>
** It is the [[quality]] of [[Reproducible Data]],  
** It can be associated with [[Reproducible Data]],  
** It is the [[quality]] of a [[Reproducible Research]],
** It can be associated with [[Reproducible Research]],
** It is the [[property]] of [[Scientific Evidence]].
** It can be a requirement of [[Scientific Evidence]].
* <B>Example(s):</B>
* <B>Example(s):</B>
** [[Inferential Reproducibility]],
** [[Inferential Reproducibility]],
Line 15: Line 15:
** [[Uncertainty]]  
** [[Uncertainty]]  
* <B>See:</B> [[Scientific Evidence]], [[Metascience]], [[Measurement]], [[Measurand]], [[Scientific Method]].
* <B>See:</B> [[Scientific Evidence]], [[Metascience]], [[Measurement]], [[Measurand]], [[Scientific Method]].
----
----
----
----
Line 22: Line 23:
=== 2019a ===
=== 2019a ===
* (Wikipedia, 2019) ⇒ https://en.wikipedia.org/wiki/Reproducibility Retrieved:2019-10-19.
* (Wikipedia, 2019) ⇒ https://en.wikipedia.org/wiki/Reproducibility Retrieved:2019-10-19.
** '''Reproducibility''' is the closeness of the agreement between the results of [[measurements]] of the same measurand carried out with same methodology described in the corresponding scientific evidence (e.g. a publication in a peer-reviewed journal). Reproducibility can also be applied under changed conditions of measurement for the same measurand - to check that the results are not an artefact of the measurement procedures &#91;[[#2008|1]]&#93; &#91;[[#1994|2]]&#93;. A related concept is replication, which is the ability to independently achieve non-identical conclusions that are at least similar, when differences in sampling, research procedures and data analysis methods may exist.  Reproducibility and replicability together are among the main tools of [[scientific method | "the scientific method"]] — with the concrete expressions of the ideal of such a method varying considerably across research disciplines and fields of study.The study of reproducibility is an important topic in [[metascience]].  
** '''Reproducibility''' is the closeness of the agreement between the results of [[measurement]]s of the same measurand carried out with same methodology described in the corresponding scientific evidence (e.g. a publication in a peer-reviewed journal). Reproducibility can also be applied under changed conditions of measurement for the same measurand - to check that the results are not an artefact of the measurement procedures &#91;[[#2008|1]]&#93; &#91;[[#1994|2]]&#93;. A related concept is replication, which is the ability to independently achieve non-identical conclusions that are at least similar, when differences in sampling, research procedures and data analysis methods may exist.  Reproducibility and replicability together are among the main tools of [[scientific method | "the scientific method"]] — with the concrete expressions of the ideal of such a method varying considerably across research disciplines and fields of study. The study of reproducibility is an important topic in [[metascience]].


=== 2019b ===
=== 2019b ===
* (Wiktionary, 2019) ⇒ https://en.wiktionary.org/wiki/reproducibility Retrieved:2019-10-19.
* (Wiktionary, 2019) ⇒ https://en.wiktionary.org/wiki/reproducibility Retrieved:2019-10-19.
** QUOTE: 1. The [[quality]] of being [[reproducible]]. <P>2. The [[closeness of agreement]] among [[repeated measurement]]s of a [[variable]] made under the same [[operating condition]]s over a [[period of time]], or by different people.
** QUOTE: 1. The [[quality]] of being [[Reproducibility Measure|reproducible]].         <P>         2. The [[closeness of agreement]] among [[repeated measurement]]s of a [[variable]] made under the same [[operating condition]]s over a [[period of time]], or by different people.
 
=== 2018 ===
* ([[2018_ReproducibilityVsReplicabilityA|Plesser, 2018]]) ⇒ [[Hans E. Plesser]]. ([[2018]]). &ldquo;[https://www.frontiersin.org/articles/10.3389/fninf.2017.00076/full Reproducibility Vs. Replicability: A Brief History of a Confused Terminology].&rdquo; In: Frontiers in neuroinformatics Journal, 11(76). [http://dx.doi.org/10.3389/fninf.2017.00076 doi:10.3389/fninf.2017.00076]
** QUOTE: Together with some [[colleague]]s, I proposed similar [[definition]]s some years ago ([[2018_ReproducibilityVsReplicabilityA#Crook2013|Crook et al., 2013]]). The different [[terminologi]]es are summarized in [[#TAB1|Table 1]].        <P>        <P><div id="TAB1">
{| class="wikitable" style="border: 1px; solid black; border-spacing: 1px;; margin: 1em auto; width: 900px;"
|-
!Goodman!!Claerbout!!ACM
|-
| || || [[Repeatability]]
|-
|[[Methods Reproducibility]]||[[Reproducibility Measure|Reproducibility]]||[[Replicability]]
|-
|[[Results Reproducibility]]|||[[Replicability]]|| [[Reproducibility Measure|Reproducibility]]
|-
|[[Inferential Reproducibility]]|| ||
|+ align="bottom" style="caption-side: bottom; text-align: left; font-weight: normal;" |<P>'''Table 1:''' Table 1. Comparison of [[terminologi]]es. See text for details.
|}</div>
 
=== 2016a ===
* (ACM, 2016) ⇒ Association for Computing Machinery (2016). Artifact Review and Badging. Available online at: https://www.acm.org/publications/policies/artifact-review-badging Retrieved: 2019-10-19.
** QUOTE: A variety of [[research communiti]]es have embraced the goal of [[Reproducibility Measure|reproducibility]] in [[experimental science]]. Unfortunately, the [[terminology]] in use has not been [[uniform]]. Because of this we find it necessary to [[define]] our [[term]]s. The following are inspired by the [[International Vocabulary for Metrology(VIM)]]; see the [https://www.acm.org/publications/policies/artifact-review-badging#appendix Appendix]] for details.
*** [[Repeatability]] (Same [[Research Team|team]], same [[experimental setup]])        <P>        The [[measurement]] can be obtained with stated [[precision]] by the same [[Research Team|team using the same [[measurement procedure]], the same [[measuring system]], under the same [[operating condition]]s, in the same [[location]] on multiple [[trial]]s. For [[computational experiment]]s, this means that a [[researcher]] can reliably [[repeat]] her own [[computation]].
*** [[Replicability]] (Different team, same [[experimental setup]])<P>        The [[measurement]] can be obtained with stated [[precision]] by a different [[Research Team|team]] using the same [[measurement procedure]], the same [[measuring system]], under the same [[operating condition]]s, in the same or a different [[location]] on multiple [[trial]]s. For [[computational experiment]]s, this means that an [[independent group]] can obtain the same [[result]] using the author’s own [[artifact]]s.
*** [[Reproducibility Measure|Reproducibility]] (Different [[Research Team|team]], different [[experimental setup]])        <P>          The [[measurement]] can be obtained with stated [[precision]] by a different [[Research Team|team]], a different [[measuring system]], in a different [[location]] on multiple [[trial]]s. For [[computational experiment]]s, this means that an [[independent group]] can obtain the same [[result]] using [[artifact]]s which they develop completely independently.
:: The [[concept]]s of [[repeatability]] and [[Reproducibility Measure|reproducibility]] are taken directly from the [[International Vocabulary for Metrology|VIM]]. [[Repeatability]] is something we expect of any [[well-controlled experiment]]. [[Result]]s that are not [[repeatable]] are rarely suitable for [[publication]]. The proposed [[intermediate concept]] of [[replicability]] [[stem]]s from the unique properties of [[computational experiment]]s, i.e., that the [[measurement procedure]]/[[Measuring System|system]], being virtual, is more easily portable, enabling inspection and exercise by others. While [[Reproducibility Measure|reproducibility]] is the [[ultimate goal]], this initiative seeks to take an [[intermediate step]], that is, to promote practices that lead to better [[replicability]]. We fully acknowledge that simple [[replication]] of [[result]]s using [[author]]-supplied [[artifact]]s is a weak form of [[Reproducibility Measure|reproducibility]]. Nevertheless, it is an important first step, and the [[auditing process]]es that go well beyond traditional [[refereeing]] will begin to raise the bar for [[experimental research]] in [[computing]].


=== 2016 ===
=== 2016b ===
* (Goodman et al., 2016) ⇒ [[Steven N. Goodman]], [[Daniele Fanelli]] and [[John P. A. Ioannidis]] (2016). [https://stm.sciencemag.org/content/8/341/341ps12 "What does research reproducibility mean?"]. Science translational medicine, 8(341). [https:doi.org/10.1126/scitranslmed.aaf5027 DOI: 10.1126/scitranslmed.aaf5027]
* ([[2016_WhatDoesResearchReproducibility|Goodman et al., 2016]]) ⇒ [[Steven N. Goodman]], [[Daniele Fanelli]], and [[John P. A. Ioannidis]]. ([[2016]]). &ldquo;[https://stm.sciencemag.org/content/8/341/341ps12 What Does Research Reproducibility Mean?].&rdquo; In: [[Science Translational Medicine Journal]], 8(341). [http://dx.doi.org/10.1126/scitranslmed.aaf5027 doi:10.1126/scitranslmed.aaf5027].
** QUOTE: [[Methods reproducibility]] is meant to capture the original meaning of [[reproducibility]], that is, the ability to implement, as exactly as possible, the [[experimental]] and [[computational procedure]]s, with the same [[data]] and [[tool]]s, to obtain the same [[result]]s. [[Results reproducibility]] refers to what was previously described as “[[replication]],” that is, the production of [[corroborating result]]s in a new study, having followed the same [[experimental method]]s. [[Inferential reproducibility]], not often recognized as a separate concept, is the making of [[knowledge claim]]s of similar strength from a study [[replication]] or [[reanalysis]]. This is not [[identical]] to [[results reproducibility]], because not all investigators will draw the same conclusions from the same results, or they might make different [[analytical choice]]s that lead to different [[inference]]s from the same [[data]]. Here, we explore the [[definition]]s and operational complexities of each of these concepts ...
** [[2016_WhatDoesResearchReproducibility#A_New_Lexicon_For_Research_Reproducibility|QUOTE]]: [[Methods reproducibility]] is meant to capture the original meaning of [[Reproducibility Measure|reproducibility]], that is, the ability to implement, as exactly as possible, the [[experimental]] and [[computational procedure]]s, with the same [[data]] and [[tool]]s, to obtain the same [[result]]s. [[Results reproducibility]] refers to what was previously described as “[[replication]],” that is, the production of [[corroborating result]]s in a new study, having followed the same [[experimental method]]s. [[Inferential reproducibility]], not often recognized as a separate concept, is the making of [[knowledge claim]]s of similar strength from a study [[replication]] or [[reanalysis]]. This is not [[identical]] to [[results reproducibility]], because not all investigators will draw the same conclusions from the same results, or they might make different [[analytical choice]]s that lead to different [[inference]]s from the same [[data]]. Here, we explore the [[definition]]s and operational complexities of each of these concepts ...


=== 2015 ===
=== 2015 ===
* (Keyrouz & Mascagni, 2015) ⇒ [[Walid Keyrouz]], and [[Michael V. Mascagni]] (2015). [https://www.nist.gov/publications/scientific-software-sustainability-numerical-reproducibility-challenge Scientific Software Sustainability: The Numerical Reproducibility Challenge] (No. Computational Science & Engineering Software Sustainability and Productivity Challenges (CSESSP Challenges)).
* (Keyrouz & Mascagni, 2015) ⇒ [[Walid Keyrouz]], and [[Michael V. Mascagni]] (2015). [https://www.nist.gov/publications/scientific-software-sustainability-numerical-reproducibility-challenge Scientific Software Sustainability: The Numerical Reproducibility Challenge] (No. Computational Science & Engineering Software Sustainability and Productivity Challenges (CSESSP Challenges)).
** QUOTE: [[Experimental reproducibility]] is a cornerstone of the [[scientific method]]. The ease of achieving its [[counterpart]] in [[computing]], [[numerical reproducibility]], was one of the core assumptions underpinning the growth of [[scientific computing]] over the past several [[decade]]s to become a powerful tool for [[scientific inquiry]] that is now widely considered as the ''third leg of science''. The other core assumption was the [[deterministic behavior]] of [[computer hardware]]. Unfortunately, these assumptions are currently being challenged by [[hardware development]]s over the past several decades as discussed and documented in recent reports and workshops. In this position paper, we are advocating for the following actions: - Redefine [[numeric reproducibility]] by considering [[numeric result]]s as [[computational measurement]]s and treat them as the equivalent of [[physical measurement]]s (...)
** QUOTE: [[Reproducibility Measure|Experimental reproducibility]] is a cornerstone of the [[scientific method]]. The ease of achieving its [[counterpart]] in [[computing]], [[numerical reproducibility]], was one of the core assumptions underpinning the growth of [[scientific computing]] over the past several [[decade]]s to become a powerful tool for [[scientific inquiry]] that is now widely considered as the ''third leg of science''. The other core assumption was the [[deterministic behavior]] of [[computer hardware]]. Unfortunately, these assumptions are currently being challenged by [[hardware development]]s over the past several decades as discussed and documented in recent reports and workshops. In this position paper, we are advocating for the following actions: - Redefine [[numeric reproducibility]] by considering [[numeric result]]s as [[computational measurement]]s and treat them as the equivalent of [[physical measurement]]s (...)


=== 2008 ===
=== 2008 ===
* (JCGM, 2008) ⇒ JCGM Working Group 1  (2008).[http://www.bipm.org/utils/common/documents/jcgm/JCGM_100_2008_E.pdf . JCGM 100:2008  Evaluation of measurement data – Guide to the expression of uncertainty in measurement (PDF)], Joint Committee for Guides in Metrology, 2008. GUM 1995 with minor corrections.
* (JCGM, 2008) ⇒ JCGM Working Group 1  (2008).[http://www.bipm.org/utils/common/documents/jcgm/JCGM_100_2008_E.pdf . JCGM 100:2008  Evaluation of measurement data – Guide to the expression of uncertainty in measurement (PDF)], Joint Committee for Guides in Metrology, 2008. GUM 1995 with minor corrections.
** QUOTE: '''[[Reproducibility|reproducibility (of results of measurements)]]''' - [[closeness]] of the agreement between the results of [[measurement]]s of the same [[measurand]] carried out under changed [[condition]]s of [[measurement]].
** QUOTE: <B>[[Reproducibility Measure|reproducibility (of results of measurements)]]</B> - [[closeness]] of the agreement between the results of [[measurement]]s of the same [[measurand]] carried out under changed [[condition]]s of [[measurement]].
*** NOTE 1: A [[valid statement]] of [[reproducibility]] requires specification of the [[condition]]s changed.
*** NOTE 1: A [[valid statement]] of [[Reproducibility Measure|reproducibility]] requires specification of the [[condition]]s changed.
*** NOTE 2: The changed [[condition]]s may include:
*** NOTE 2: The changed [[condition]]s may include:
**** [[principle of measurement]];
**** [[principle of measurement]];
Line 49: Line 75:
**** [[condition]]s of use
**** [[condition]]s of use
**** [[time]].
**** [[time]].
*** NOTE 3: [[Reproducibility]] may be expressed [[quantitatively]] in terms of the [[dispersion]] characteristics of the [[result]]s.
*** NOTE 3: [[Reproducibility Measure|Reproducibility]] may be expressed [[quantitatively]] in terms of the [[dispersion]] characteristics of the [[result]]s.
*** NOTE 4: Results are here usually understood to be corrected results.
*** NOTE 4: Results are here usually understood to be corrected results.


=== 1994 ===
=== 1994 ===
*  (Taylor & Kuyatt, 1994) ⇒ [[Barry N. Taylor]], and [[Chris E. Kuyatt]] (1994). [https://www.nist.gov/pml/nist-technical-note-1297 NIST Guidelines for Evaluating and Expressing the Uncertainty of NIST Measurement Results Cover], Gaithersburg, MD, USA: National Institute of Standards and Technology.  
*  (Taylor & Kuyatt, 1994) ⇒ [[Barry N. Taylor]], and [[Chris E. Kuyatt]] (1994). [https://www.nist.gov/pml/nist-technical-note-1297 NIST Guidelines for Evaluating and Expressing the Uncertainty of NIST Measurement Results Cover], Gaithersburg, MD, USA: National Institute of Standards and Technology.
 
----
----
__NOTOC__
[[Category:Concept]]
[[Category:Concept]]
__NOTOC__

Latest revision as of 04:37, 24 June 2024

A Reproducibility Measure is a similarity measure in agreement between results of measurements of the same measurand carried out by the same operating conditions over a time period, or by different observers or under changed conditions of measurement.



References

2019a

  • (Wikipedia, 2019) ⇒ https://en.wikipedia.org/wiki/Reproducibility Retrieved:2019-10-19.
    • Reproducibility is the closeness of the agreement between the results of measurements of the same measurand carried out with same methodology described in the corresponding scientific evidence (e.g. a publication in a peer-reviewed journal). Reproducibility can also be applied under changed conditions of measurement for the same measurand - to check that the results are not an artefact of the measurement procedures [1] [2]. A related concept is replication, which is the ability to independently achieve non-identical conclusions that are at least similar, when differences in sampling, research procedures and data analysis methods may exist. Reproducibility and replicability together are among the main tools of "the scientific method" — with the concrete expressions of the ideal of such a method varying considerably across research disciplines and fields of study. The study of reproducibility is an important topic in metascience.

2019b

2018

Goodman Claerbout ACM
Repeatability
Methods Reproducibility Reproducibility Replicability
Results Reproducibility Replicability Reproducibility
Inferential Reproducibility

Table 1: Table 1. Comparison of terminologies. See text for details.

2016a

The concepts of repeatability and reproducibility are taken directly from the VIM. Repeatability is something we expect of any well-controlled experiment. Results that are not repeatable are rarely suitable for publication. The proposed intermediate concept of replicability stems from the unique properties of computational experiments, i.e., that the measurement procedure/system, being virtual, is more easily portable, enabling inspection and exercise by others. While reproducibility is the ultimate goal, this initiative seeks to take an intermediate step, that is, to promote practices that lead to better replicability. We fully acknowledge that simple replication of results using author-supplied artifacts is a weak form of reproducibility. Nevertheless, it is an important first step, and the auditing processes that go well beyond traditional refereeing will begin to raise the bar for experimental research in computing.

2016b

2015

2008

1994