Relative Word Frequency Function

From GM-RKB
(Redirected from Word distribution)
Jump to navigation Jump to search

A Relative Word Frequency Function is a relative text item frequency function [math]\displaystyle{ tf }[/math] that produces a relative term frequency for some word form [math]\displaystyle{ w }[/math] with respect to a word form set [math]\displaystyle{ D }[/math].



References

2009

  • (Wikipedia, 2009) ⇒ http://en.wikipedia.org/wiki/Tf%E2%80%93idf#Mathematical_details
    • The term count in the given document is simply the number of times a given term appears in that document. This count is usually normalized to prevent a bias towards longer documents (which may have a higher term count regardless of the actual importance of that term in the document) to give a measure of the importance of the term [math]\displaystyle{ t_{i} }[/math] within the particular document [math]\displaystyle{ d_{j} }[/math]. Thus we have the term frequency, defined as follows.

      [math]\displaystyle{ \mathrm{tf_{i,j}} = \frac{n_{i,j}}{\sum_k n_{k,j}} }[/math] where [math]\displaystyle{ n_{i,j} }[/math]

      is the number of occurrences of the considered term ([math]\displaystyle{ t_{i} }[/math]) in document [math]\displaystyle{ d_{j} }[/math], and the denominator is the sum of number of occurrences of all terms in document [math]\displaystyle{ d_{j} }[/math].

2003

2001