Measure of Agreement
(Redirected from Inter-rater reliability)
Jump to navigation
Jump to search
A Measure of Agreement is a performance measure for a multi-agent prediction tasks between different predictors.
- AKA: Inter-Rater Reliability, Inter-Rater Agreement, Inter-Rater Concordance, Inter-Observer Reliability, Inter-Coder Reliability.
- Context:
- It can range from being a Measure of Classification Agreement to being a Measure of Ranking Agreement to being a Measure of Estimation Agreement.
- Example(s):
- Counter-Example(s):
- See: Manual Annotation Task, Intra-Class Correlation, Consensus, Concordance Correlation Coefficient.
References
2022a
- (Wikipedia, 2022) ⇒ https://en.wikipedia.org/wiki/Inter-rater_reliability Retrieved:2022-3-20.
- In statistics, inter-rater reliability (also called by various similar names, such as inter-rater agreement, inter-rater concordance, inter-observer reliability, inter-coder reliability, and so on) is the degree of agreement among independent observers who rate, code, or assess the same phenomenon.
Assessment tools that rely on ratings must exhibit good inter-rater reliability, otherwise they are not valid tests.
There are a number of statistics that can be used to determine inter-rater reliability. Different statistics are appropriate for different types of measurement. Some options are joint-probability of agreement, such as Cohen's kappa, Scott's pi and Fleiss' kappa; or inter-rater correlation, concordance correlation coefficient, intra-class correlation, and Krippendorff's alpha.
- In statistics, inter-rater reliability (also called by various similar names, such as inter-rater agreement, inter-rater concordance, inter-observer reliability, inter-coder reliability, and so on) is the degree of agreement among independent observers who rate, code, or assess the same phenomenon.
2022b
- (Wikipedia, 2022) ⇒ https://en.wikipedia.org/wiki/Glossary_of_clinical_research#I Retrieved:2022-3-20.
- Inter-rater reliability
- The property of yielding equivalent results when used by different raters on different occasions. (ICH E9)
- Inter-rater reliability
2008
- (Upton & Cook, 2008) ⇒ Graham Upton, and Ian Cook. (2008). “A Dictionary of Statistics, 2nd edition revised." Oxford University Press. ISBN:0199541450
- QUOTE: measure of agreement: A single statistic used to summarize the agreement between the rankings or classifications of objects made by two or more observers. Examples are the coefficient of concordance, Cohen’s kappa, and rank correlation coefficients.
1993
- (James et al., 1993) ⇒ Lawrence R. James, Robert G. Demaree, and Gerrit Wolf. (1993). “rwg: An assessment of within-group interrater agreement.” In: Journal of Applied Psychology, 78(2).
- QUOTE: F. L. Schmidt and J. E. Hunter (1989) critiqued the within-group interrater reliability statistic (rwg) described by L. R. James et al (1984). S. W. Kozlowski and K. Hattrup (1992) responded to the Schmidt and Hunter critique and argued that rwg is a suitable index of interrater agreement. This article focuses on the interpretation of rwg as a measure of agreement among judges' ratings of a single target. A new derivation of rwg is given that underscores this interpretation.