Term-Document Co-Occurrence Matrix
(Redirected from term–document matrix)
Jump to navigation
Jump to search
A Term-Document Co-Occurrence Matrix is a word-text item co-occurrence matrix with a word-document co-occurrence statistic.
- AKA: Word-Document Matrix.
- Context:
- It can range from being a Raw Document-Term Co-Occurrence Matrix to being a Normalized Document-Term Co-Occurrence Matrix, such as a word-document PMI matrix.
- It can be produced by a A Term-Document Matrix Construction System (that solves a A Term-Document Matrix Construction Task).
- Example(s):
I like hate databases D1 1 1 0 1 D2 1 0 1 1
- Counter-Example(s):
- See: Word Similarity Function Training Task, tf-idf Matrix.
References
2015
- (Wikipedia, 2015) ⇒ http://en.wikipedia.org/wiki/document-term_matrix Retrieved:2015-2-18.
- A document-term matrix or term-document matrix is a mathematical matrix that describes the frequency of terms that occur in a collection of documents. In a document-term matrix, rows correspond to documents in the collection and columns correspond to terms. There are various schemes for determining the value that each entry in the matrix should take. One such scheme is tf-idf. They are useful in the field of natural language processing.
2009
- (Recchia & Jones, 2009) ⇒ Gabriel Recchia, and Michael N. Jones. (2009). “More Data Trumps Smarter Algorithms: Comparing Pointwise Mutual Information with Latent Semantic Analysis.” In: Behavior research methods, 41(3).
- QUOTE: As described by Landauer, Foltz, and Laham (1998), LSA operates by first constructing a term–document matrix in which the value of each cell (i,j) represents the number of occurrences of word i in document j.