Random Indexing Algorithm
Jump to navigation
Jump to search
A Random Indexing Algorithm is a dimension compression algorithm that ...
- …
- Counter-Example(s):
- See: Reflective Random Indexing, Distributional Semantics, Vector Space Model, Random Projection, Locality-Sensitive Hashing, Sparse Distributed Memory, Bit Vector, Hamming Distance, Document Clustering.
References
2015
- (Wikipedia, 2015) ⇒ http://en.wikipedia.org/wiki/random_indexing Retrieved:2015-2-2.
- Random indexing is a dimension reduction method and computational framework for Distributional semantics, based on the insight that very-high-dimensional Vector Space Model implementations are impractical, that models need not grow in dimensionality when new items (e.g. new terminology) is encountered, and that a high-dimensional model can be projected into a space of lower dimensionality without compromising L2 distance metrics if the resulting dimensions are chosen appropriately, which is the original point of the random projection approach to dimension reduction first formulated as the Johnson–Lindenstrauss lemma. Locality-sensitive hashing has some of the same starting points. Random indexing, as used in representation of language, originates from the work of Pentti Kanerva on Sparse distributed memory, and can be described as an incremental formulation of a random projection.
It can be also verified that random indexing is a random projection technique for the construction of Euclidean spaces --- i.e. L2 normed vecor spaces. [1] In Euclidean spaces, random projections are elucidated using the Johnson–Lindenstrauss lemma. [2] TopSig [3] extends the Random Indexing model to produce bit vectors for comparision with the Hamming distance similarity function. It is used for improving the performance of information retrieval and document clustering.
- Random indexing is a dimension reduction method and computational framework for Distributional semantics, based on the insight that very-high-dimensional Vector Space Model implementations are impractical, that models need not grow in dimensionality when new items (e.g. new terminology) is encountered, and that a high-dimensional model can be projected into a space of lower dimensionality without compromising L2 distance metrics if the resulting dimensions are chosen appropriately, which is the original point of the random projection approach to dimension reduction first formulated as the Johnson–Lindenstrauss lemma. Locality-sensitive hashing has some of the same starting points. Random indexing, as used in representation of language, originates from the work of Pentti Kanerva on Sparse distributed memory, and can be described as an incremental formulation of a random projection.
- ↑ QasemiZadeh, B. & Handschuh, S. (2014) Random Manhattan Indexing, In: Proceedings of the 25th International Workshop on Database and Expert Systems Applications.
- ↑ Johnson, W. and Lindenstrauss, J. (1984) Extensions of Lipschitz mappings into a Hilbert space, in Contemporary Mathematics. American Mathematical Society, vol. 26, pp. 189–206.
- ↑ Geva, S. & De Vries, C.M. (2011) TopSig: Topology Preserving Document Signatures, In: Proceedings of Conference on Information and Knowledge Management 2011, 24-28 October 2011, Glasgow, Scotland.
2010
- (Cohen et al., 2010) ⇒ Trevor Cohen, Roger Schvaneveldt, and Dominic Widdows. (2010). “Reflective Random Indexing and indirect inference: A scalable method for discovery of implicit connections.” In: Journal of Biomedical Informatics, 43(2).
- QUOTE: The discovery of implicit connections between terms that do not occur together in any scientific document underlies the model of literature-based knowledge discovery first proposed by Swanson. Corpus-derived statistical models of semantic distance such as Latent Semantic Analysis (LSA) have been evaluated previously as methods for the discovery of such implicit connections. However, LSA in particular is dependent on a computationally demanding method of dimension reduction as a means to obtain meaningful indirect inference, limiting its ability to scale to large text corpora. In this paper, we evaluate the ability of Random Indexing (RI), a scalable distributional model of word associations, to draw meaningful implicit relationships between terms in general and biomedical language. Proponents of this method have achieved comparable performance to LSA on several cognitive tasks while using a simpler and less computationally demanding method of dimension reduction than LSA employs. In this paper, we demonstrate that the original implementation of RI is ineffective at inferring meaningful indirect connections, and evaluate Reflective Random Indexing (RRI), an iterative variant of the method that is better able to perform indirect inference. RRI is shown to lead to more clearly related indirect connections and to outperform existing RI implementations in the prediction of future direct co-occurrence in the MEDLINE corpus.