tf-idf Score Vector
Jump to navigation
Jump to search
A tf-idf Score Vector is a score vector that represents the tf-idf score of each member [math]\displaystyle{ t }[/math] of a multiset [math]\displaystyle{ D_x }[/math] (often a bag-of-words) with respect to a multiset space [math]\displaystyle{ C }[/math] (often a text corpus).
- Context:
- It can (often) be a Sparse Vector.
- It can (often) be used by a tf-idf Vector Distance Function.
- It can be represented a [math]\displaystyle{ [\text{tf-idf}(t_1,D_x,C), \text{tf-idf}(t_2,D_x,C), ..., \text{tf-idf}(t_n,D_x,C)] }[/math], where ...
- It can be a member of tf-idf Score Matrix.
- ...
- Example(s):
- a Word tf-idf Score Vector, such as: [math]\displaystyle{ \text{tf-idf}(\text{doc}_{27}; \text{Corpus}_C) = [(``\text{apple}",0.0017...), ..., (``\text{chrome-plated}",0), ..., (``\text{zoo}",0)] }[/math].
- a N-Gram tf-idf Score Vector, such as: [math]\displaystyle{ \text{tf-idf}(\text{doc}_{27}; \text{Corpus}_C) = [(``\text{apple}",0.0017...), ..., (``\text{chrome-plated}",0), ..., (``\text{zoo}",0)] }[/math].
- …
- Counter-Example(s):
- See: Text Clustering.
References
2013
- http://www.philippeadjiman.com/blog/2010/12/30/how-to-easily-build-and-observe-tf-idf-weight-vectors-with-lucene-and-mahout/
- You have a collection of text documents, and you want to build their TF-IDF weight vectors, probably before doing some clustering on the collection or other related tasks.