Bag-of-Words Document Vector
(Redirected from Document Word Vector)
Jump to navigation
Jump to search
A Bag-of-Words Document Vector is a document vector that is a bag-of-words vector.
- AKA: Document Term Index Vector.
- Context: It can range from being a Binarized BoW Document Vector to being a Weighted BoW Document Vector.
- Context:
- It can make use of an Indexing Vocabulary that defines the Indexing Space.
- …
- Counter-Example(s):
- a Word Vector.
- a Passage Vector.
- See: Mention Vector, Passage Word Vector, Weighted Vector.
References
2007
- (Recupero, 2007) ⇒ Diego Reforgiato Recupero. (2007). “A New Unsupervised Method for Document Clustering by using WordNet Lexical and Conceptual Relations.” In: Information Retrieval (2007) 10:563–579.
- QUOTE: Many well-known methods of text clustering make use of a long list of words as vector space which is often unsatisfactory for a couple of reasons: first, it keeps the dimensionality of the data very high, and second, it ignores important relationships between terms like synonyms or antonyms. Our unsupervised method solves both problems by using ANNIE and WordNet lexical categories and WordNet ontology in order to create a well structured document vector space whose low dimensionality allows common clustering algorithms to perform well.
2001
- (Stephens et al., 2001) ⇒ M. Stephens, M. Palakal, S. Mukhopadhyay, and R. Raje. (2001). “Detecting gene relations from MEDLINE abstracts.” In: ProceedingsSixth Annual Pacific Symposium on Biocomputing, pages 483–496.
- QUOTE: The document representation step converts text documents into structures that can be efficiently processed without the loss of vital content. At the core of this process is a thesaurus, an array T of atomic tokens (e.g., a single term) each identified by a unique numeric identifier culled from authoritative sources or automatically. … The purpose of the document representation step is to convert each document to a weight vector whose dimension is the same as the number of terms in the thesaurus.
1975
- (Salton et al., 1975) ⇒ Gerard M. Salton, A. Wong, and C. Yang. (1975). “A Vector Space Model for Automatic Indexing.” In: Communications of the ACM, 18(11). doi:10.1145/361219.361220.