SemEval-2017 Task 2
Jump to navigation
Jump to search
A SemEval-2017 Task 2 is a SemEval-2017 Task that is a Multilingual and Cross-lingual Semantic Word Similarity Benchmark Task that evaluates and compares several multi-lingual and cross-lingual semantic word similarity systems.
- AKA: SemEval-2017 Semantic Word Similarity Benchmark Task.
- Context:
- Task Input: Natural Language Data.
- Task Output: Semantic Similarity Score between word pairs across English, Farsi, German, Italian and Spanish languages.
- Task Requirement(s):
- It is divided into 2 subtasks:
- Subtask 1: Multi-lingual Semantic Word Similarity Task,
- Subtask 2: Cross-lingual Semantic Word Similarity Task.
- Example(s):
- Monolingual SWS systems performance (Pearson ($r$), Spearman correlation ($\rho$)):
System | English | Farsi | German | Italian | Spanish | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
$r$ | $\rho$ | Final | $r$ | $\rho$ | Final | $r$ | $\rho$ | Final | $r$ | $\rho$ | Final | $r$ | $\rho$ | Final | |
Luminoso_run2 | 0.78 | 0.80 | 0.79 | 0.51 | 0.50 | 0.50 | 0.70 | 0.70 | 0.70 | 0.73 | 0.75 | 0.74 | 0.73 | 0.75 | 0.74 |
Luminoso_run1 | 0.78 | 0.79 | 0.79 | 0.51 | 0.50 | 0.50 | 0.69 | 0.69 | 0.69 | 0.73 | 0.75 | 0.74 | 0.73 | 0.75 | 0.74 |
QLUT_run1* | 0.78 | 0.78 | 0.78 | − | − | − | − | − | − | − | − | − | − | − | − |
HCCL_run1* | 0.68 | 0.70 | 0.69 | 0.42 | 0.45 | 0.44 | 0.58 | 0.61 | 0.59 | 0.63 | 0.67 | 0.65 | 0.69 | 0.72 | 0.70 |
NASARI (baseline) | 0.68 | 0.68 | 0.68 | 0.41 | 0.40 | 0.41 | 0.51 | 0.51 | 0.51 | 0.60 | 0.59 | 0.60 | 0.60 | 0.60 | 0.60 |
QLUT_run2* | 0.67 | 0.67 | 0.67 | − | − | − | − | − | − | − | − | − | − | − | − |
SEW_run2 (a.d.) | 0.56 | 0.58 | 0.57 | 0.38 | 0.40 | 0.39 | 0.45 | 0.45 | 0.45 | 0.57 | 0.57 | 0.57 | 0.61 | 0.62 | 0.62 |
SEW_run1 | 0.37 | 0.41 | 0.39 | 0.38 | 0.40 | 0.39 | 0.45 | 0.45 | 0.45 | 0.57 | 0.57 | 0.57 | 0.61 | 0.62 | 0.62 |
Mahtab_run2* | − | − | − | 0.72 | 0.71 | 0.71 | − | − | − | − | − | − | − | − | − |
Mahtab_run1* | − | − | − | 0.72 | 0.71 | 0.71 | − | − | − | − | − | − | − | − | − |
System | Score | Official Rank |
---|---|---|
Luminoso_run2 | 0.743 | 1 |
Luminoso_run1 | 0.740 | 2 |
HCCL_run1* | 0.658 | 3 |
NASARI (baseline) | 0.598 | − |
RUFINO_run1* | 0.555 | 4 |
SEW_run2 (a.d.) | 0.552 | − |
SEW_run1 | 0.506 | 5 |
RUFINO_run2* | 0.369 | 6 |
hjpwhuer_run1 | 0.018 | 7 |
- Cross-lingual SWS systems performance (Pearson ($r$), Spearman correlation ($\rho$)):
System | German-Spanish | German-Farsi | German-Italian | English-German | English-Spanish | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
$r$ | $\rho$ | Final | $r$ | $\rho$ | Final | $r$ | $\rho$ | Final | $r$ | $\rho$ | Final | $r$ | $\rho$ | Final | |
Luminoso_run2 | 0.72 | 0.74 | 0.73 | 0.59 | 0.59 | 0.59 | 0.74 | 0.75 | 0.74 | 0.76 | 0.77 | 0.76 | 0.75 | 0.77 | 0.76 |
Luminoso_run1 | 0.72 | 0.73 | 0.72 | 0.59 | 0.59 | 0.59 | 0.73 | 0.74 | 0.73 | 0.75 | 0.77 | 0.76 | 0.75 | 0.77 | 0.76 |
NASARI (baseline) | 0.55 | 0.55 | 0.55 | 0.46 | 0.45 | 0.46 | 0.56 | 0.56 | 0.56 | 0.60 | 0.59 | 0.60 | 0.64 | 0.63 | 0.63 |
OoO_run1 | 0.54 | 0.56 | 0.55 | − | − | − | 0.54 | 0.55 | 0.55 | 0.56 | 0.58 | 0.57 | 0.58 | 0.59 | 0.58 |
SEW_run2 (a.d.) | 0.52 | 0.54 | 0.53 | 0.42 | 0.44 | 0.43 | 0.52 | 0.52 | 0.52 | 0.50 | 0.53 | 0.51 | 0.59 | 0.60 | 0.59 |
SEW_run1 | 0.52 | 0.54 | 0.53 | 0.42 | 0.44 | 0.43 | 0.52 | 0.52 | 0.52 | 0.46 | 0.47 | 0.46 | 0.50 | 0.51 | 0.50 |
HCCL_run2* (a.d.) | 0.42 | 0.39 | 0.41 | 0.33 | 0.28 | 0.30 | 0.38 | 0.34 | 0.36 | 0.49 | 0.48 | 0.48 | 0.55 | 0.56 | 0.55 |
RUFINO_run1† | 0.31 | 0.32 | 0.32 | 0.23 | 0.25 | 0.24 | 0.32 | 0.33 | 0.33 | 0.33 | 0.34 | 0.33 | 0.34 | 0.34 | 0.34 |
RUFINO_run2† | 0.30 | 0.30 | 0.30 | 0.26 | 0.27 | 0.27 | 0.22 | 0.24 | 0.23 | 0.30 | 0.30 | 0.30 | 0.34 | 0.33 | 0.34 |
hjpwhu_run2 | 0.05 | 0.05 | 0.05 | 0.01 | 0.01 | 0.01 | 0.06 | 0.05 | 0.05 | 0.04 | 0.04 | 0.04 | 0.04 | 0.04 | 0.04 |
hjpwhu_run1 | 0.05 | 0.05 | 0.05 | 0.01 | 0.01 | 0.01 | 0.06 | 0.05 | 0.05 | -0.01 | -0.01 | 0.00 | 0.04 | 0.04 | 0.04 |
HCCL_run1* | 0.03 | 0.02 | 0.02 | 0.03 | 0.02 | 0.02 | 0.03 | -0.01 | 0.00 | 0.34 | 0.28 | 0.31 | 0.10 | 0.08 | 0.09 |
UniBuc-Sem_run1* | − | − | − | − | − | − | − | − | − | 0.05 | 0.06 | 0.06 | 0.08 | 0.10 | 0.09 |
Citius_run1† | − | − | − | − | − | − | − | − | − | − | − | − | 0.57 | 0.59 | 0.58 |
Citius_run2† | − | − | − | − | − | − | − | − | − | − | − | − | 0.56 | 0.58 | 0.57 |
... |
- Counter-Example(s):
- See: Semantic Similarity Measure, Semantic Similarity Task, Semantic Word Similarity Task, Semantic Relatedness Measure, Semantic Analysis Task.
References
2017
- (Camacho-Collados et al., 2017) ⇒ Jose Camacho-Collados, aMohammad Taher Pilehvar, Nigel Collier, and Roberto Navigli. (2017). “SemEval-2017 Task 2: Multilingual and Cross-lingual Semantic Word Similarity.” In: Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval@ACL 2017).
- QUOTE: This paper introduces a new task on Multilingual and Cross-lingual Semantic Word Similarity which measures the semantic similarity of word pairs within and across five languages: English, Farsi, German, Italian and Spanish. High quality datasets were manually curated for the five languages with high inter-annotator agreements (consistently in the 0.9 ballpark). These were used for semi-automatic construction of ten cross-lingual datasets. 17 teams participated in the task, submitting 24 systems in subtask 1 and 14 systems in subtask 2. Results show that systems that combine statistical knowledge from text corpora, in the form of word embeddings, and external knowledge from lexical resources are best performers in both subtasks.