Relative Word Frequency Function
(Redirected from Word distribution)
Jump to navigation
Jump to search
A Relative Word Frequency Function is a relative text item frequency function [math]\displaystyle{ tf }[/math] that produces a relative term frequency for some word form [math]\displaystyle{ w }[/math] with respect to a word form set [math]\displaystyle{ D }[/math].
- AKA: Term Frequency, Lexical Distribution, [math]\displaystyle{ tf(w,D) }[/math] .
- Context:
- It can be calculated by taking the ratio of the Term Absolute Frequency Count to the Word Form Set Size.
- It can be a component of a TF-IDF Ranking Function.
- It can be used as a Word Frequency Rank Function.
- Quantitatively a term's relevance generally does not increase in linear proportion to its term frequency.
- Example(s):
- TF(t,D)= N(t,D)/N(D) ; where N(t,D) is the number of times that a terms appears in a document, and N(D) is the number of terms in the document.
- …
- Counter-Example(s):
- See: TF-IDF Weight, Ranking Function, Zipf's Law.
References
2009
- (Wikipedia, 2009) ⇒ http://en.wikipedia.org/wiki/Tf%E2%80%93idf#Mathematical_details
- The term count in the given document is simply the number of times a given term appears in that document. This count is usually normalized to prevent a bias towards longer documents (which may have a higher term count regardless of the actual importance of that term in the document) to give a measure of the importance of the term [math]\displaystyle{ t_{i} }[/math] within the particular document [math]\displaystyle{ d_{j} }[/math]. Thus we have the term frequency, defined as follows.
[math]\displaystyle{ \mathrm{tf_{i,j}} = \frac{n_{i,j}}{\sum_k n_{k,j}} }[/math] where [math]\displaystyle{ n_{i,j} }[/math]
is the number of occurrences of the considered term ([math]\displaystyle{ t_{i} }[/math]) in document [math]\displaystyle{ d_{j} }[/math], and the denominator is the sum of number of occurrences of all terms in document [math]\displaystyle{ d_{j} }[/math].
- The term count in the given document is simply the number of times a given term appears in that document. This count is usually normalized to prevent a bias towards longer documents (which may have a higher term count regardless of the actual importance of that term in the document) to give a measure of the importance of the term [math]\displaystyle{ t_{i} }[/math] within the particular document [math]\displaystyle{ d_{j} }[/math]. Thus we have the term frequency, defined as follows.
2003
- (Mitkov, 2003) ⇒ Ruslan Mitkov, editor. (2003). “The Oxford Handbook of Computational Linguistics." Oxford University Press. ISBN:019927634X
- QUOTE: term frequency (tf): A measurement of the frequency of a word or term within a particular document. Term frequency reflects how well that term describes the document contents.
2001
- (Jacquemin, 2001) ⇒ Christian Jacquemin. (2001). “Spotting and Discovering Terms Through Natural Language Processing." MIT Press. ISBN:0262100851
- QUOTE: Relative frequency: The relative frequency of a term in a document is the ratio of its number of occurrences to the size of the document.