Polyglot System
(Redirected from Polyglot word embedding)
Jump to navigation
Jump to search
A Polyglot System is a Word Embedding System that produces word vector representation by training a skipgram model on a multi-lingual corpus.
- Context:
- Source code available at https://github.com/aboSamoor/polyglot
- It is also an Out-Of-Vocabulary (OOV) Embedding System.
- It was first introduced by Al-Rfou et al. (2013).
- Example(s):
- Counter-Example(s):
- See: DeepWalk, SpeedRead, WikiTalk, In-Vocabulary Word, Unseen Word, Rare Word, NLP System, Text Classification System, Subword Unit, Subword Embedding System, In-Vocabulary Embedding System.
References
2021
- (Polyglot) ⇒ https://polyglot.readthedocs.io/en/latest/Embeddings.html Retrieved:2021-05-09.
- QUOTE: Word embedding is a mapping of a word to a d-dimensional vector space. This real valued vector representation captures semantic and syntactic features. Polyglot offers a simple interface to load several formats of word embeddings.
from polyglot.mapping import Embedding
- QUOTE: Word embedding is a mapping of a word to a d-dimensional vector space. This real valued vector representation captures semantic and syntactic features. Polyglot offers a simple interface to load several formats of word embeddings.
2020
- (Palakodety, 2020) ⇒ Shriphani Palakodety (2020). "Polyglot Word Embeddings Discover Language Clusters".
- QUOTE: Polyglot word embeddings obtained by training a skipgram model on a multi-lingual corpus discover extremely high-quality language clusters.
These can be trivially retrieved using an algorithm like k−Means giving us a fully unsupervised language identification system.
Experiments show that these clusters are on-par with results produced by popular open source (FastText LangID) and commercial models (Google Cloud Translation).
- QUOTE: Polyglot word embeddings obtained by training a skipgram model on a multi-lingual corpus discover extremely high-quality language clusters.
2013
- (Al-Rfou et al., 2013) ⇒ Rami Al-Rfou, Bryan Perozzi, and Steven Skiena. (2013). “Polyglot: Distributed Word Representations for Multilingual {NLP}.” In: Proceedings of the Seventeenth Conference on Computational Natural Language Learning (CoNLL 2013).