2008 FastLogisticRegressionforTextCa
- (Ifrim et al., 2008) ⇒ Georgiana Ifrim, Gökhan Bakir, and Gerhard Weikum. (2008). “Fast Logistic Regression for Text Categorization with Variable-length N-grams.” In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-2008). doi:10.1145/1401890.1401936
Subject Headings:
Notes
Cited By
- http://scholar.google.com/scholar?q=%22Fast+logistic+regression+for+text+categorization+with+variable-length+n-grams%22+2008
- http://portal.acm.org/citation.cfm?doid=1401890.1401936&preflayout=flat#citedby
Quotes
Author Keywords
Abstract
A common representation used in text categorization is the bag of words model (aka. unigram model). Learning with this particular representation involves typically some preprocessing, e.g. stopwords-removal, stemming. This results in one explicit tokenization of the corpus. In this work, we introduce a logistic regression approach where learning involves automatic tokenization. This allows us to weaken the a-priori required knowledge about the corpus and results in a tokenization with variable-length (word or character) n-grams as basic tokens. We accomplish this by solving logistic regression using gradient ascent in the space of all ngrams. We show that this can be done very efficiently using a branch and bound approach which chooses the maximum gradient ascent direction projected onto a single dimension (i.e., candidate feature). Although the space is very large, our method allows us to investigate variable-length n-gram learning. We demonstrate the efficiency of our approach compared to state-of-the-art classifiers used for text categorization such as cyclic coordinate descent logistic regression and support vector machines.
References
,
Author | volume | Date Value | title | type | journal | titleUrl | doi | note | year | |
---|---|---|---|---|---|---|---|---|---|---|
2008 FastLogisticRegressionforTextCa | Gerhard Weikum Georgiana Ifrim Gökhan Bakir | Fast Logistic Regression for Text Categorization with Variable-length N-grams | 10.1145/1401890.1401936 |