HCCL Semantic Word Similarity System
(Redirected from HCCL)
Jump to navigation
Jump to search
A HCCL Semantic Word Similarity System is a multilingual and cross-lingual semantic word similarity system that is a combination of a word embedding system and a machine translation system.
- Context:
- It was developed by He et al.(2017).
- It has been benchmarked at SemEval-2017 Task 2 and ranked 3rd place for monoligual and 6th for cross-lingual semantic word similarity tasks.
- …
- Example(s):
- Counter-Example(s):
- See: Semantic Word Similarity Benchmark Task, Semantic Textual Similarity Benchmark Task, Semantic Similarity Measure, Semantic Relatedness Measure.
References
2017a
- (Camacho-Collados et al., 2017) ⇒ Jose Camacho-Collados, aMohammad Taher Pilehvar, Nigel Collier, and Roberto Navigli. (2017). “SemEval-2017 Task 2: Multilingual and Cross-lingual Semantic Word Similarity.” In: Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval@ACL 2017).
2017b
- (He et al., 2017) ⇒ Junqing He, Long Wu, Xuemin Zhao, and Yonghong Yan. (2017). “HCCL at SemEval-2017 Task 2: Combining Multilingual Word Embeddings and Transliteration Model for Semantic Similarity.” In: Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval@ACL 2017).
- QUOTE: In this task, we adopt different strategies for the two subtasks. We use word2vec for subtask1, monolingual word similarity. For the subtask2, cross-lingual word similarity, we use jointly optimized cross-lingual word representation in addition to transliteration model. We build a crosslingual word embedding system and a special machine translation system. Our approach has the following characteristics:
- Fast and efficient. Both word2vec and the cross-lingual word embeddgings tool have impressive speed (Coulmance et al., 2015) and not need expensive annotated word-aligned data.
- Decreasing OOVs. Our translation system is featured by its transliteration model that deal with OOVs outside the parallel corpus.
- QUOTE: In this task, we adopt different strategies for the two subtasks. We use word2vec for subtask1, monolingual word similarity. For the subtask2, cross-lingual word similarity, we use jointly optimized cross-lingual word representation in addition to transliteration model. We build a crosslingual word embedding system and a special machine translation system. Our approach has the following characteristics:
- (...)
We use skip-gram word embeddings directly for monolingual subtask. For cross-lingual subtask, we use English as pivot language and train multilingual word embeddings using monolingual corpora and sentence-aligned parallel data. A translation model is also trained by our statistical machine translation system. Subsequently, we translate the words in the test set into English and look up their word embeddings. For those out of English word embeddings, we check them from original language word embeddings.