Text Segmentation Task
(Redirected from Text Segmentation)
Jump to navigation
Jump to search
A Text Segmentation Task is a text processing task that is a string segmentation task that requires the text annotation of coherent text segments.
- Context:
- Input: Digital Text Items.
- output: Segmented Text Items.
- measure: Text Segmentation Performance Measures.
- It can be solved by a Text Segmentation System (that implements a text segmentation algorithm).
- It can range from being a Full Text Segmentation Task to being a Partial Text Segmentation Task.
- It can range from being a Syntactic Text Chunking Task (such as VP chunking) to being a Semantic Text Chunking Task (such as NER).
- It can range from being a Heuristic Text Segmentation Task to being a Data-Driven Text Segmentation Task (such as supervised text segmentation).
- It can range from being a Language-Specific Text Segmentation Task to being a Language-Agnostic Text Segmentation Task.
- …
- Example(s):
- Syntactic Text Segmentation Tasks, such as:
- a ___ Chunking Task, such as ...
- a Syntactic-Phrase Chunking Task, such as noun phrase chunking.
- a Written Sentence Segmentation Task.
- a Written Phrase Segmentation Task.
- a Written Word Mention Segmentation Task, such as orthographic word segmentation.
- …
- Semantic Text Segmentation Tasks, such as:
- a Semantic Text Chunking Task, such as: NER.
- Word and Subword Segmentation Tasks, such as:
- a Text Word Segmentation Task, such as:
- a Morph Segmentation Task, such as:
- Other ...
- …
- Topic Segmentation Task.
- ...
- Document Segmentation Task.
- …
- Syntactic Text Segmentation Tasks, such as:
- Counter-Example(s):
- a Handwritten Item Segmentation Task.
- a Software Statement Tokenization Task, such as
x1=1.5; &rArr' [x1][=][1.5][;]
. - a Text Token Tagging Task, such as POS Tagging.
- a Speech Segmentation Task.
- a DNA Segmentation Task.
- See: Text Segment, Linguistic Topic.
References
2022
- (Wikipedia, 2022) ⇒ https://en.wikipedia.org/wiki/Text_segmentation Retrieved:2022-3-21.
- Text segmentation is the process of dividing written text into meaningful units, such as words, sentences, or topics. The term applies both to mental processes used by humans when reading text, and to artificial processes implemented in computers, which are the subject of natural language processing. The problem is non-trivial, because while some written languages have explicit word boundary markers, such as the word spaces of written English and the distinctive initial, medial and final letter shapes of Arabic, such signals are sometimes ambiguous and not present in all written languages.
Compare speech segmentation, the process of dividing speech into linguistically meaningful portions.
- Text segmentation is the process of dividing written text into meaningful units, such as words, sentences, or topics. The term applies both to mental processes used by humans when reading text, and to artificial processes implemented in computers, which are the subject of natural language processing. The problem is non-trivial, because while some written languages have explicit word boundary markers, such as the word spaces of written English and the distinctive initial, medial and final letter shapes of Arabic, such signals are sometimes ambiguous and not present in all written languages.
2005
- (McDonald et al., 2005) ⇒ Ryan McDonald, Koby Crammer, and Fernando Pereira. (2005). “Flexible text segmentation with structured multilabel classification.” In: Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing (HLT/EMNLP, 2005).
2000
- (McCallum et al., 2000) ⇒ Andrew McCallum, Dayne Freitag, and Fernando Pereira. (2000). “Maximum Entropy Markov Models for Information Extraction and Segmentation.” In: Proceedings of ICML-2000.
- (Choi, 2000) ⇒ Freddy Y. Y. Choi. (2000). “Advances in Domain Independent Linear Text Segmentation.” In: Proceedings of the 1st North American chapter of the Association for Computational Linguistics Conference.
1999
- (Beeferman et al, 1999) ⇒ Doug Beeferman, Adam Berger, and John D. Lafferty. (1999). “Statistical Models for Text Segmentation.” In: Machine Learning, 34(1–3).
- QUOTE:This paper introduces a new statistical approach to automatically partitioning text into coherent segments. ... Assessment of our approach on quantitative and qualitative grounds demonstrates its effectiveness in two very different domains, Wall Street Journal news articles and television broadcast news story transcripts. Quantitative results on these domains are presented using a new probabilistically motivated error metric, which combines precision and recall in a natural and flexible way. This metric is used to make a quantitative assessment of the relative contributions of the different feature types, as well as a comparison with decision trees and previously proposed text segmentation algorithms.
1988
- (Hobbs et al, 1988) ⇒ Jerry R. Hobbs, Mark Stickel, Paul Martin, and Douglas Edwards. (1988). “Interpretation as Abduction.” In: Proceedings of the 26th annual meeting on Association for Computational Linguistics (ACL 1988).