Text-Item Classification Algorithm
(Redirected from Document Classification algorithm)
Jump to navigation
Jump to search
A Text-Item Classification Algorithm is a classification algorithm that can be implemented by a[ text-item classification system to solve an automated text classification task.
- Context:
- It can range from being a Single-Label Text Classification Algorithm to being (typically) a Multi-Label Text Classification Algorithm.
- It can range from being a Heuristic Text Classification Algorithm to being a Data-Driven Text Classification Algorithm (such as a supervised text classification algorithm).
- It can range from being a Short-Text Text-Item Classification Algorithm to being a Long-Text Text-Item Classification Algorithm.
- ...
- It can use Domain Adaptation to improve performance when the intent categories vary across application contexts.
- It can use Data Augmentation to handle small datasets and improve generalization.
- It can apply Class Imbalance Handling Techniques to ensure robustness across varied label distributions.
- It can involve Sequence-to-Label Models for ordered text sequences, such as sentences or paragraphs.
- It can leverage Hierarchical Text Classification when dealing with nested or multi-level class structures.
- It can include Zero-Shot Classification methods for predicting unseen classes using contextual knowledge.
- It can face challenges such as Class Overlap, Label Ambiguity, and Data Sparsity.
- It can support Human-in-the-Loop Learning approaches to refine label definitions or adjust model behavior.
- ...
- Example(s):
- a Text-Item Intent Classification Algorithm (for text-item intent classification tasks such as mapping user commands to intent classes).
- a Word-Level Classification Algorithm (for word-level classification tasks such as part-of-speech tagging or named entity classification).
- a Sentence Classification Algorithm (for sentence-level classification tasks such as identifying question, statement, or command types).
- a Document Topic Classification Algorithm (for document-level classification tasks such as categorizing research papers into subject categories like science, sports, or finance).
- a Sentiment Analysis Algorithm (for sentiment classification tasks such as determining positive, neutral, or negative sentiment in social media posts).
- a Spam Email Classification Algorithm (for email classification tasks such as identifying spam versus ham in email texts).
- a Legal Text Classification Algorithm (for legal text classification tasks such as categorizing legal documents based on case types or legal issues).
- a Fake News Detection Algorithm (for news classification tasks such as detecting misinformation or fake news in news articles).
- a Text-Item Topic Classification Algorithm (for short-text classification tasks such as classifying tweets into predefined topic categories).
- a Medical Text Classification Algorithm (for clinical text classification tasks such as categorizing clinical notes or medical records into disease categories or treatment types).
- a Toxic Comment Classification Algorithm (for online comment classification tasks such as detecting toxic language or offensive content in social media comments).
- an Emotion Classification Algorithm (for emotion detection tasks such as identifying emotional states like joy, sadness, or anger in customer feedback).
- a Language Identification Algorithm (for language identification tasks such as determining the language of a given text-item, such as English, Spanish, or French).
- a Text-Item Genre Classification Algorithm (for literary genre classification tasks such as classifying books or short stories into genre categories like science fiction, fantasy, or romance).
- a Question Classification Algorithm (for question classification tasks such as identifying question types like factual, opinion-based, or recommendation requests in Q&A forums).
- a Code Snippet Classification Algorithm (for code classification tasks such as identifying programming languages or code functionality in code repositories).
- a Dialogue Act Classification Algorithm (for dialogue act classification tasks such as detecting greetings, requests, or offers in conversational text).
- a Product Review Classification Algorithm (for review classification tasks such as identifying product categories or aspect-based sentiments in customer reviews).
- a News Article Classification Algorithm (for news classification tasks such as identifying news categories like politics, entertainment, or sports in online news articles).
- …
- Counter-Example(s):
- See: Sentiment Classification Algorithm, Labeled Corpus.
References
2020
- (Alzamzami et al., 2020) ⇒ Fatimah Alzamzami, Mohamad Hoda, and Abdulmotaleb El Saddik. (2020). “Light Gradient Boosting Machine for General Sentiment Classification on Short Texts: A Comparative Evaluation.” IEEE Access, 8. DOI:10.1109/ACCESS.2020.2997330
- (Qi, 2020) ⇒ Zhang Qi. (2020). “The Text Classification of Theft Crime based on TF-IDF and XGBoost Model.” In: 2020 IEEE International Conference on Artificial Intelligence and Computer Applications (ICAICA-2020)
2007
- (Shehata et al., 2007) ⇒ Shady Shehata, Fakhri Karray, and Mohamed Kamel. (2007). “A Concept-based Model for Enhancing Text Categorization." (KDD-2007) In: Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2007).
- (Thet et al., 2007) ⇒ Tun Thura Thet, Jin-Cheon Na, and Christopher S. G. Khoo. (2007). “Filtering Product Reviews from Web Search Results.” In: Proceedings of the 2007 ACM symposium on Document Engineering.
2002
- (Lodhi et al., 2002) ⇒ Huma Lodhi, Craig Saunders, John Shawe Taylor, Nello Cristianini, and Chris Watkins. (2002). “Text Classification Using String Kernels.” In: The Journal of Machine Learning Research, 2.
- (Sebastiani, 2002) ⇒ Fabrizio Sebastiani. (2002). “Machine Learning in Automated Text Categorization.” In: Association of Computing Machinery Computing Surveys (CSUR), 34(1).
2000
- (Nigam et al., 2000) ⇒ Kamal Nigam, Andrew McCallum, Tom M. Mitchell, and W. Cohen. (2000). “Text Classification from Labeled and Unlabeled Documents Using EM.” In: Machine Learning. doi:10.1023/A:1007692713085
1999
- (McCallum, 1999) ⇒ Andrew McCallum. (1999). “Multi-label Text Classication with a Mixture Model Trained by EM.” In: AAAI 99 Workshop on Text Learning.
1998
- (Joachims, 1998) ⇒ Thorsten Joachims. (1998). “Text Categorization with Support Vector Machines: Learning with Many Relevant Features.” In: Proceedings of the European Conference on Machine Learning (ECML 1998).
- (McCallum & Niham, 1998) ⇒ Andrew McCallum, Kamal Nigam. (1998). “A Comparison of Event Models for Naive Bayes Text Classification.” In: AAAI/ICML-98 Workshop on Learning for Text Categorization.
- (Apte et al., 1998) ⇒ C. Apte, F. Damerau, and Sholom M. Weiss. (1998). “Text mining with decision rules and decision trees.” In: Proceedings of the Conference on Automated Learning and Discorery, Workshop 6: Learning from Text and the Web.
- (Baker & McCallum, 1998) ⇒ L. Douglas Baker, Andrew McCallum. (1998). “Distributional Clustering of Words for Text Classification.” In: Proceedings of the 21st ACM SIGIR Conference Retrieval (SIGIR 1998).
- (Lam & Ho, 1998) ⇒ Wai Lam, and Chao Yang Ho. (1998). “Using a Generalized Instance Set for Automatic Text Categorization.” In: Proceedings of the 21st ACM SIGIR Conference retrieval (SIGIR 1998). doi:10.1145/290941.290961
- (McCallum & Nigam, 1998) ⇒ Andrew McCallum, and Kamal Nigam. (1998). “A Comparison of Event Models for Naive Bayes Text Classification.” In: Proceedings of AAAI-98 Workshop on Learning for Text Categorization.
- Thorsten Joachims. (1998). “Text Categorization with Support Vector Machines: Learning with Many Relevant Features.” In: Proceedings of the European Conference on Machine Learning (ECML),
1996
- David D. Lewis, Robert E. Schapire, James P. Callan, and Ron Papka. (1996). “Training algorithms for linear text classifiers.” In: Proceedings of the 19th ACM SIGIR Conference retrieval.
- William W. Cohen, and Yoram Singer. (1996). “Context-Sensitive Learning Methods for Text Categorization.” In: Proceedings of the 19th ACM SIGIR Conference Retrieval.
- I. Moulinier, G. Raskinis, and J. Ganascia. (1996). “Text categorization: a symbolic approach.” In: Proceedings of the Fifth Annual Symposium on Document Analysis and Information Retrieval.
1995
- William W. Cohen. (1995). “Text Categorization and Relational Learning.” In: The Twelfth International Conference on Machine Learning (ICML 1995).
1994
- Chidanand Apté, Fred Damerau, and Sholom M. Weiss. (1994). “Towards Language Independent Automated Learning of Text Categorization Models.” In: Proceedings of the 17th ACM SIGIR Conference Retrieval.
- (Yang & Chute, 1994) ⇒ Yiming Yang, and Christopher G. Chute. (1994). “An Example-based Mapping Method for Text Categorization and Retrieval.” In: ACM Transactions on Information Systems (TOIS 1994), 12(3).
- (Cavnar & Trenkle, 1994) ⇒ William B. Cavnar, and John M. Trenkle. (1994). “N-gram-based Text Categorization.” In: Proceedings of SDAIR-94, 3rd Annual Symposium on Document Analysis and Information Retrieval.
1993
- Kostas Tzeras, Stephan Hartmann. (1993). “Automatic Indexing Based on Bayesian Inference Networks.” In: Proceedings of the 16th ACM SIGIR Conference retrieval (SIGIR 1993).
1991
- N. Fuhr, S. Hartmanna, G. Lustig, M. Schwantner, and K. Tzeras. (1991). “Air/x - a rule-based Multistage Indexing Systems for Large Subject Fields.” In: Proceedings of RIAO 1991.
1975
- (Field, 1975) ⇒ B. J. Field. (1975). “Towards Automatic Indexing: Automatic assignment of controlled-language indexing and classification from free indexing.” In: : Journal of Documentation, 31(4). doi:10.1108/eb026605
1963
- (Borko & Bernick, 1963) ⇒ Harold Borko, and Myrna Bernick. (1963). “Automatic Document Classification.” In: Journal of the ACM (JACM).