Average Pairwise Pearson Correlation Coefficient
Jump to navigation
Jump to search
An Average Pairwise Pearson Correlation Coefficient is a Mean Value that is calculated by averaging pairwise Person correlation coefficients.
- Example(s):
- Counter-Example(s):
- See: Pairwise Correlation Matrix, Spearman Correlation Coefficient, Pearson Correlation Coefficient.
References
2017
- (Camacho-Collados et al., 2017) ⇒ Jose Camacho-Collados, Mohammad Taher Pilehvar, Nigel Collier, and Roberto Navigli. (2017). “SemEval-2017 Task 2: Multilingual and Cross-lingual Semantic Word Similarity.” In: Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval@ACL 2017).
- QUOTE: Table 3 (first row) reports the average pairwise Pearson correlation among the three annotators for each of the five languages. Given the fact that our word pairs spanned a wide range of domains, and that there was a possibility for annotators to misunderstand some words, we devised a procedure to check the quality of the annotations and to improve the reliability of the similarity scores. To this end, for each dataset and for each annotator we picked the subset of pairs for which the difference between the assigned similarity score and the average of the other two annotations was more than 1.0, according to our similarity scale. The annotator was then asked to revise this subset performing a more careful investigation of the possible meanings of the word pairs contained therein, and change the score if necessary. This procedure resulted in considerable improvements in the consistency of the scores. The second row in Table 3 (“Revised scores”) shows the average pairwise Pearson correlation among the three revised sets of scores for each of the five languages. The inter-annotator agreement for all the datasets is consistently in the 0.9 ballpark, which demonstrates the high quality of our multilingual datasets thanks to careful annotation of word pairs by experts.
- QUOTE: Table 3 (first row) reports the average pairwise Pearson correlation among the three annotators for each of the five languages. Given the fact that our word pairs spanned a wide range of domains, and that there was a possibility for annotators to misunderstand some words, we devised a procedure to check the quality of the annotations and to improve the reliability of the similarity scores. To this end, for each dataset and for each annotator we picked the subset of pairs for which the difference between the assigned similarity score and the average of the other two annotations was more than 1.0, according to our similarity scale. The annotator was then asked to revise this subset performing a more careful investigation of the possible meanings of the word pairs contained therein, and change the score if necessary. This procedure resulted in considerable improvements in the consistency of the scores. The second row in Table 3 (“Revised scores”) shows the average pairwise Pearson correlation among the three revised sets of scores for each of the five languages. The inter-annotator agreement for all the datasets is consistently in the 0.9 ballpark, which demonstrates the high quality of our multilingual datasets thanks to careful annotation of word pairs by experts.
English | Farsi | German | Italian | Spanish | |
---|---|---|---|---|---|
Initial scores | 0.836 | 0.839 | 0.864 | 0.798 | 0.829 |
Revised scores | 0.893 | 0.906 | 0.916 | 0.900 | 0.890 |