GenSim System

From GM-RKB

Jump to navigation Jump to search

A GenSim System is a Word Embedding System that builds semantic vectors from plain text documents by examining statistical co-occurrence patterns within a training corpus.

Context:
- GitHub repository: https://github.com/RaRe-Technologies/gensim
- Source code: https://pypi.org/project/gensim/0.13.1/
- It was first introduced by Rehurek & Sojka(2010).
- …
Example(s):
Counter-Example(s):
See: One-Hot Encoding System, DeepLearning4J, Word Similarity Task, Word Analogy Task, Distributional Co-Occurrence Word Vector, Character Embedding System, Graph Embedding System, Subword Embedding System.

References

2021

(Gensim, 2021) ⇒ https://radimrehurek.com/gensim/intro.html#what-is-gensim
- QUOTE: Gensim is a free open-source Python library for representing documents as semantic vectors, as efficiently (computer-wise) and painlessly (human-wise) as possible.
  Gensim is designed to process raw, unstructured digital texts (“plain text”) using unsupervised machine learning algorithms.
  The algorithms in Gensim, such as Word2Vec, FastText, Latent Semantic Indexing (LSI, LSA, LsiModel), Latent Dirichlet Allocation (LDA, LdaModel) etc, automatically discover the semantic structure of documents by examining statistical co-occurrence patterns within a corpus of training documents. These algorithms are unsupervised, which means no human input is necessary – you only need a corpus of plain text documents.
  Once these statistical patterns are found, any plain text documents (sentence, phrase, word ...) can be succinctly expressed in the new, semantic representation and queried for topical similarity against other documents (words, phrases...).

2010

(Rehurek & Sojka, 2010) ⇒ Radim Rehurek, and Petr Sojka. (2010). “Software Framework for Topic Modelling with Large Corpora.” In: Proceedings of the LREC 2010 workshop on new challenges for NLP frameworks.

Retrieved from "http://www.gabormelli.com/RKB/index.php?title=GenSim_System&oldid=888590"