Supervised Text-Item Classification Algorithm
(Redirected from Supervised Text Classification Algorithm)
Jump to navigation
Jump to search
A Supervised Text-Item Classification Algorithm is a data-driven text-item classification algorithm that is a supervised classification algorithm.
- Context:
- It can be implemented by a Supervised Text Classification System (to solve a supervised text classification task).
- It can range from being a Fully-Supervised Text Classification Algorithm to being a Semi-Supervised Text Classification Algorithm.
- It can range from being a Supervised Binary Text Classification Algorithm to being a Supervised Multiclass Text Classification Algorithm.
- It can range from being a Supervised Unilabel Text Classification Algorithm to being a Supervised Multilabel Text Classification Algorithm.
- It can range from being a Supervised Free-Form Text-Item Classification Algorithm to being a Supervised Structured Text-Item Classification Algorithm.
- ...
- Example(s):
- Counter-Example(s):
- See: Supervised Text-Item Classification, Supervised Text-Item Classification System.
References
2023
- (Lin et al., 2023) ⇒ Yu-Chen Lin, Si-An Chen, Jie-Jyun Liu, and Chih-Jen Lin. (2023). “Linear Classifier: An Often-Forgotten Baseline for Text Classification.” In: arXiv preprint arXiv:2306.07111. doi:10.48550/arXiv.2306.07111
- NOTE:
- It utilizes 2622 preprocessed theft crime cases from a city spanning 2009-2019, aiming to enhance crime prediction accuracy using text classification.
- It employs the TF-IDF (Term Frequency-Inverse Document Frequency) model for feature extraction, determining the relevance of words in the crime data documents.
- NOTE:
2007a
- (Shehata et al., 2007) ⇒ Shady Shehata, Fakhri Karray, and Mohamed Kamel. (2007). “A Concept-based Model for Enhancing Text Categorization." (KDD-2007) In: Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2007).
2007b
- (Thet et al., 2007) ⇒ Tun Thura Thet, Jin-Cheon Na, and Christopher S. G. Khoo. (2007). “Filtering Product Reviews from Web Search Results.” In: Proceedings of the 2007 ACM symposium on Document Engineering.
- Compares the performance of a Supervised Learning Algorithm and a Heuristic Approach to a Text Categorization Task that is based on Search Snippets.
2002a
- (Lodhi et al., 2002) ⇒ Huma Lodhi, Craig Saunders, John Shawe Taylor, Nello Cristianini, and Chris Watkins. (2002). “Text Classification Using String Kernels.” In: The Journal of Machine Learning Research, 2.
- We propose a novel approach for categorizing text documents based on the use of a special kernel. The kernel is an inner product in the feature space generated by all subsequences of length k. A subsequence is any ordered sequence of k characters occurring in the text though not necessarily contiguously.
2002b
- (Sebastiani, 2002) ⇒ Fabrizio Sebastiani. (2002). “Machine Learning in Automated Text Categorization.” In: Association of Computing Machinery Computing Surveys (CSUR), 34(1).
- … In the research community the dominant approach to this problem is based on machine learning techniques: a general inductive process automatically builds a classifier by learning, from a set of preclassified documents, the characteristics of the categories.
2001
- (Slonim and Tishby, 2001) ⇒ N. Slonim, and N. Tishby. (2001). “The Power of Word Clusters for Text Classification.” In: Proceedings of the 23rd European Colloquium on Information Retrieval Research (ECIR 2001).
2000
- (Nigam et al., 2000) ⇒ Kamal Nigam, Andrew McCallum, Tom M. Mitchell, and W. Cohen. (2000). “Text Classification from Labeled and Unlabeled Documents Using EM.” In: Machine Learning. doi:10.1023/A:1007692713085
- (Han & Karypsis, 2000) ⇒ Eui-Hong (Sam) Han, and George Karypis. (2000). “Centroid-based Document Classification: Analysis and Experimental Results.” In: Army High Performance Computing.
1999
- (McCallum, 1999) ⇒ Andrew McCallum. (1999). “Multi-label Text Classication with a Mixture Model Trained by EM.” In: AAAI 99 Workshop on Text Learning.
- (Yang & Liu, 1999) ⇒ Yiming Yang, and Xin Liu. (1999). “A Re-examination of Text Categorization Methods.” In: Proceedings of the 22nd ACM SIGIR Conference Retrieval (SIGIR 1999).
- (Nigam et al., 1999) ⇒ Kamal Nigam, John Lafferty, and Andrew McCallum. (1999). “Using Maximum Entropy for Text Classification.” In: IJCAI-99 workshop on machine learning for information filtering.
- QUOTE: Maximum entropy is a probability distribution estimation technique widely used for a variety of natural language tasks, such as language modeling, part-of-speech tagging, and text segmentation. The underlying principle of maximum entropy is that without external knowledge, one should prefer distributions that are uniform. Constraints on the distribution, derived from labeled training data, inform the technique where to be minimally non-uniform. The maximum entropy formulation has a unique solution which can be found by the improved iterative scaling algorithm. In this paper, maximum entropy is used for text classification by estimating the conditional distribution of the class variable given the document. In experiments on several text datasets we compare accuracy to naive Bayes and show that maximum entropy is sometimes significantly better, but also sometimes worse. Much future work remains, but the results indicate that maximum entropy is a promising technique for text classification.
1998
- (Apte et al., 1998) ⇒ C. Apte, F. Damerau, and Sholom M. Weiss. (1998). “Text mining with decision rules and decision trees.” In: Proceedings of the Conference on Automated Learning and Discorery, Workshop 6: Learning from Text and the Web.
- (Baker & McCallum, 1998) ⇒ L. Douglas Baker, and Andrew McCallum. (1998). “Distributional Clustering of Words for Text Classification.” In: Proceedings of the 21st ACM SIGIR Conference Retrieval (SIGIR 1998). doi:10.1145/290941.290970
- It suggests that Word Stemming Task can impair classification performance.
- It proposes the clustering of terms that tend to indicate the presence of the same category.
- It applies a Bayesian Classification Algorithm.
- (Joachims, 1998) ⇒ Thorsten Joachims. (1998). “Text Categorization with Support Vector Machines: Learning with Many Relevant Features.” In: Proceedings of the European Conference on Machine Learning (ECML 1998).
- (Lam & Ho, 1998) ⇒ Wai Lam, and Chao Yang Ho. (1998). “Using a Generalized Instance Set for Automatic Text Categorization.” In: Proceedings of the 21st ACM SIGIR Conference retrieval (SIGIR 1998). doi:10.1145/290941.290961
- (McCallum & Niham, 1998) ⇒ Andrew McCallum, Kamal Nigam. (1998). “A Comparison of Event Models for Naive Bayes Text Classification.” In: AAAI/ICML-98 Workshop on Learning for Text Categorization.
- ((McCallum & Nigam, 1998) ⇒ Andrew McCallum, and Kamal Nigam. (1998). “A Comparison of Event Models for Naive Bayes Text Classification.” In: Proceedings of AAAI-98 Workshop on Learning for Text Categorization.
1997
- Y Yang, JO Pedersen. (1997). “A Comparative Study on Feature Selection in Text Categorization.” In: MACHINE LEARNING-INTERNATIONAL WORKSHOP THEN CONFERENCE-....
- Hwee Tou Ng, Wei Boon Goh, and Kok Leong Low. (1997). “Feature selection, perception learning, and a usability case study for text categorization.]] In: Proceedings of the 20th ACM SIGIR Conference Retrieval.
- Daphne Koller, and Mehran Sahami. (1997). “Hierarchically Classifying Documents Using Very Few Words.” In: Proceedings of the Fourteenth International Conference on Machine Learning (ICML 1997).
1996
- David D. Lewis, Robert E. Schapire, James P. Callan, and Ron Papka. (1996). “Training algorithms for linear text classifiers.” In: Proceedings of the 19th ACM SIGIR Conference retrieval.
- William W. Cohen, and Yoram Singer. (1996). “Context-Sensitive Learning Methods for Text Categorization.” In: Proceedings of the 19th ACM SIGIR Conference Retrieval.
- I. Moulinier, G. Raskinis, and J. Ganascia. (1996). “Text categorization: a symbolic approach.” In: Proceedings of the Fifth Annual Symposium on Document Analysis and Information Retrieval.
1995
- E. Wiener, J.O. Pedersen, and A.S. Weigend. (1995). “A neural network approach to topic spotting.” In: Proceedings of the Fourth Annual Symposium on Document Analysis and Information Retrieval (SDAIR 1995).
- William W. Cohen. (1995). “Text Categorization and Relational Learning.” In: The Twelfth International Conference on Machine Learning (ICML 1995).
1994
- Chidanand Apté, Fred Damerau, and Sholom M. Weiss. (1994). “Towards Language Independent Automated Learning of Text Categorization Models.” In: Proceedings of the 17th ACM SIGIR Conference Retrieval.
- D. D. Lewis, and M. Ringuette. (1994). “Comparison of Two Learning Algorithms for Text Categorization.” In: Proceedings of the Third Annual Symposium on Document Analysis and Information Retrieval (SDAIR 1994).
- (Yang & Chute, 1994) ⇒ Yiming Yang, and Christopher G. Chute. (1994). “An Example-based Mapping Method for Text Categorization and Retrieval.” In: ACM Transactions on Information Systems (TOIS 1994), 12(3).
- (Cavnar & Trenkle, 1994) ⇒ William B. Cavnar, and John M. Trenkle. (1994). “N-gram-based Text Categorization.” In: Proceedings of SDAIR-94, 3rd Annual Symposium on Document Analysis and Information Retrieval.
1993
- Kostas Tzeras, Stephan Hartmann. (1993). “Automatic Indexing Based on Bayesian Inference Networks.” In: Proceedings of the 16th ACM SIGIR Conference retrieval (SIGIR 1993).
1992
- (Masand et al., 1992) ⇒ Brij Masand, Gordon Linoff, and David Waltz. (1992). “Classifying News Stories Using Memory Based Reasoning.” In: Proceedings of the 15th ACM SIGIR Conference.
- Notes: proposes a k-Nearest Neighbor-based Supervised Text Classification Algorithm.
1991
- N. Fuhr, S. Hartmanna, G. Lustig, M. Schwantner, and K. Tzeras. (1991). “Air/x - a rule-based Multistage Indexing Systems for Large Subject Fields.” In: Proceedings of RIAO 1991.