Document Classification Task
Jump to navigation
Jump to search
A Document Classification Task is a classification task whose input is a document and whose output is a document category from a document category set.
- Context:
- It can range from being a Text Item Classification Task to being a Structured Document Classification Task.
- It can range from being a Heuristic Document Classification Task to being a Data-Driven Document Classification Task (such as supervised document classification)
- It can be solved by a Document Classification System (that implements a Document Classification algorithm).
- It can be supported by a Document Index Creation Task.
- …
- Example(s):
- Counter-Example(s):
- See: Document Semantic Parsing Task.
References
2014
- (Wikipedia, 2014) ⇒ http://en.wikipedia.org/wiki/Document_classification Retrieved:2014-10-31.
- Document classification or document categorization is a problem in library science, information science and computer science. The task is to assign a document to one or more classes or categories. This may be done "manually" (or "intellectually") or algorithmically. The intellectual classification of documents has mostly been the province of library science, while the algorithmic classification of documents is used mainly in information science and computer science. The problems are overlapping, however, and there is therefore also interdisciplinary research on document classification.
The documents to be classified may be texts, images, music, etc. Each kind of document possesses its special classification problems. When not otherwise specified, text classification is implied.
Documents may be classified according to their subjects or according to other attributes (such as document type, author, printing year etc.). In the rest of this article only subject classification is considered. There are two main philosophies of subject classification of documents: The content based approach and the request based approach.
- Document classification or document categorization is a problem in library science, information science and computer science. The task is to assign a document to one or more classes or categories. This may be done "manually" (or "intellectually") or algorithmically. The intellectual classification of documents has mostly been the province of library science, while the algorithmic classification of documents is used mainly in information science and computer science. The problems are overlapping, however, and there is therefore also interdisciplinary research on document classification.