Term Weighting Function
Jump to navigation
Jump to search
A Term Weighting Function is a weight function for a term (relative to a corpus).
- Context:
- Example(s):
- See: Term Multiset, Term Tuple Distance, Multiset Distance.
References
2006
- (Reed et al., 2006) ⇒ Joel W Reed, Yu Jiao, Thomas E. Potok, Brian A. Klump, Mark T. Elmore, and Ali R. Hurson. (2006). “TF-ICF: A New Term Weighting Scheme for Clustering Dynamic Data Streams.” In: Machine Learning and Applications, 2006. ICMLA'06. 5th International Conference on.
- QUOTE: Typically, clustering algorithms use the Vector Space Model (VSM) [17] to encode documents. The VSM relates terms to documents, and since different terms have different importance in a given document, a term weight is associated with every term [18]. These term weights are often derived from the frequency of a term within a document or set of documents. Many term weighting schemes have been proposed [5,9,18]. Most of these existing methods work under the assumption that the whole data set is available and static.
2005
- (Lan, Tan et al., 2005) ⇒ Man Lan, Chew-Lim Tan, Hwee-Boon Low, and Sam-Yuan Sung. (2005). “A Comprehensive Comparative Study on Term Weighting Schemes for Text Categorization with Support Vector Machines.” In: Special interest tracks and posters of the 14th International Conference on World Wide Web.
- QUOTE: Term weighting scheme, which has been used to convert the documents as vectors in the term space, is a vital step in automatic text categorization. In this paper, we conducted comprehensive experiments to compare various term weighting schemes with SVM on two widely-used benchmark data sets. We also presented a new term weighting scheme tf:rf to improve the term's discriminating power. The controlled experimental results showed that this newly proposed tf:rf scheme is significantly better than other widely-used term weighting schemes. Compared with schemes related with tf factor alone, the idf factor does not improve or even decrease the term's discriminating power for text categorization.