2017 HCCLatSemEval2017Task2Combining
- (He et al., 2017) ⇒ Junqing He, Long Wu, Xuemin Zhao, and Yonghong Yan. (2017). “HCCL at SemEval-2017 Task 2: Combining Multilingual Word Embeddings and Transliteration Model for Semantic Similarity.” In: Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval@ACL 2017).
Subject Headings: HCCL; SemEval-2017; SemEval-2017 Task 2; Semantic Word Similarity System; Semantic Word Similarity Benchmark Task; Multilingual And Cross-Lingual Semantic Word Similarity System; Machine Translation.
Notes
Cited By
- Google Scholar: ~ 2 Citations
Quotes
Abstract
In this paper, we introduce an approach to combining word embeddings and machine translation for multilingual semantic word similarity, the task2 of SemEval-2017. Thanks to the unsupervised transliteration model, our cross-lingual word embeddings encounter decreased sums of OOVs. Our results are produced using only monolingual Wikipedia corpora and a limited amount of sentence-aligned data. Although relatively little resources are utilized, our system ranked 3rd in the monolingual subtask and can be the 6th in the cross-lingual subtask.
1. Introduction
...
In this task, we adopt different strategies for the two subtasks. We use word2vec for subtask1, monolingual word similarity. For the subtask2, cross-lingual word similarity, we use jointly optimized cross-lingual word representation in addition to transliteration model. We build a crosslingual word embedding system and a special machine translation system. Our approach has the following characteristics:
- Fast and efficient. Both word2vec and the cross-lingual word embeddgings tool have impressive speed (Coulmance et al., 2015) and not need expensive annotated word-aligned data.
- Decreasing OOVs. Our translation system is featured by its transliteration model that deal with OOVs outside the parallel corpus.
We constructed a naive system and did not try out the parameters for embeddings and translation models in limited time.
2. Our Approach
We use skip-gram word embeddings directly for monolingual subtask. For cross-lingual subtask, we use English as pivot language and train multilingual word embeddings using monolingual corpora and sentence-aligned parallel data. A translation model is also trained by our statistical machine translation system. Subsequently, we translate the words in the test set into English and look up their word embeddings. For those out of English word embeddings, we check them from original language word embeddings.
...
3. Experiments
4. Results
5. Conclusion
Acknowledgments
Footnotes
References
BibTeX
@inproceedings{2017_HCCLatSemEval2017Task2Combining, author = {Junqing He and Long Wu and Xuemin Zhao and Yonghong Yan}, editor = {Steven Bethard and Marine Carpuat and Marianna Apidianaki and Saif M. Mohammad and Daniel M. Cer and David Jurgens}, title = {HCCL at SemEval-2017 Task 2: Combining Multilingual Word Embeddings and Transliteration Model for Semantic Similarity}, booktitle = {Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval@ACL 2017)}, pages = {220--225}, publisher = {Association for Computational Linguistics}, year = {2017}, url = {https://doi.org/10.18653/v1/S17-2033}, doi = {10.18653/v1/S17-2033}, }
Author | volume | Date Value | title | type | journal | titleUrl | doi | note | year | |
---|---|---|---|---|---|---|---|---|---|---|
2017 HCCLatSemEval2017Task2Combining | Junqing He Long Wu Xuemin Zhao Yonghong Yan | HCCL at SemEval-2017 Task 2: Combining Multilingual Word Embeddings and Transliteration Model for Semantic Similarity | 2017 |