Indra System
An Indra System is a Word Embedding System that supports the creation, use and evaluation of word embedding models.
- Context:
- It was first introduced by Sales et al. (2018).
- It is divided into two modules: Indra-Indexer (WEM's generation) and Indra (implements Indra Agorithms).
- It can be designed to be an open-source library and also a web service.
- Example(s):
- PyIndra,
indra 1.21.0
- a python implementation by Benjamin Gyori (2021),- …
- Counter-Example(s):
- BERT System (Devlin et al., 2019),
- DISSECT System (Dinu et al., 2013),
- ELMo System (Peters et al., 2018),
- fastText System (Bojanowski et al., 2017),
- Flair Word Embedding System (Akbik et al., 2018),
- GenSim System (Rehurek & Sojka, 2010),
- GloVe System (Pennington et al., 2014),
- JoBimText System (Biemann & Riedl, 2013),
- MIMICK System (Pinter et al., 2017),
- MorphoRNN Embedding System (Luong et al., 2013),
- Polyglot System (Al-Rfou et al., 2013),
- SENNA System (Collobert & Weston, 2008),
- S-Space Word Embedding System (Jurgens & Stevens, 2010),
- SumEmbed System (Botha & Blunsom, 2014),
- VarEmbed System (Bhatia et al., 2016),
- Word2Vec System (Mikolov et al., 2014).
- See: Character Embedding System, Out-Of-Vocabulary (OOV) Embedding System, Subword Embedding System, Natural Language Processing System, Machine Translation System, Translation-based Word Embedding System, Semantic Relatedness, Word Similarity Task, Word Analogy Task.
References
2018
- (Sales et al., 2018) ⇒ Juliano Efson Sales, Leonardo Souza, Siamak Barzegar, Brian Davis, Andre Freitas, and Siegfried Handschuh. (2018). “Indra: A Word Embedding and Semantic Relatedness Server.” In: Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018).
- QUOTE: To support this demand, this paper describes INDRA, a word embedding/distributional semantics framework which supports the creation, use and evaluation of word embedding models. INDRA provides a software infrastructure to facilitate the experimentation and customisation of multilingual WEMs, allowing end-users and applications to consume and operate over multiple word embedding spaces as a service or library.
(...)
The INDRA PROJECT is divided into two major modules: INDRAINDEXER and INDRA. INDRAINDEXER is responsible for the generation of the models, whereas INDRA implements the consumption methods.
INDRA is designed to be a stand-alone library and also a web service. Figure 1 depicts the main components of its architecture. INDRAINDEXER supports the generation of WEMs directly from text files (Wikipedia-dump or plaintext formats), passing through the corpus pre-processing and multiword expression identification, to the model generation itself. INDRA dynamically builds the pipeline based on the metadata information produced during the model generation. This strategy guarantees that the same set of pre-processing operations are consistently applied to the input query. Additionally, the translation-based word embedding (Freitas et al., 2016; Barzegar et al., 2018b) can be conveniently activated in the pipeline as described in Section 4.
- QUOTE: To support this demand, this paper describes INDRA, a word embedding/distributional semantics framework which supports the creation, use and evaluation of word embedding models. INDRA provides a software infrastructure to facilitate the experimentation and customisation of multilingual WEMs, allowing end-users and applications to consume and operate over multiple word embedding spaces as a service or library.