Ortographic-Linguistic Annotation System
Jump to navigation
Jump to search
A Ortographic-Linguistic Annotation System is Linguistic Annotation System which is input is a written language dataset (text item).
- AKA: Natural Language Ortographic Annotation System.
- Context:
- It solves a Ortographic-Linguistic Annotation Task by implementing Ortographic-Linguistic Annotation Algorithms.
- It usually integrates a Tokenization System and a Sentence Boundary Detection System.
- Example(s):
- A Text Annotation System such as:
- A Corpus Annotation System,
- Counter-Examples(s):
- See: UIMA, CoNLL File Format, Annotation System, Document Annotation System, Natural Language Processing System, Natural Language Understanding System.
References
2009
- (Wilcock, 2009) ⇒ Graham Wilcock. (2009). “Introduction to Linguistic Annotation and Text Analytics.” In: Synthesis Lectures on Human Language Technologies. Morgan & Claypool. doi:10.2200/S00194ED1V01Y200905HLT003 ISBN:1598297384
- QUOTE: The current state of the art in linguistic annotation also divides the different annotation tasks into different levels, which can be arranged into a similar set of layers as shown in Figure 2.2. However, there is only an approximate correspondence between the levels of the tasks performed in practical corpus annotation work and the levels of description in linguistic theory.
(...)
This book focusses on the annotation of texts, where the language is written not spoken, so we do not include an annotation level matching phonology. The annotation tasks that deal with the level of orthography are tokenization and sentence boundary detection. These tasks segment the text into distinct words (tokens) and distinct sentences. It does not usually matter which of these two tasks is performed first, but it is important that both tasks are performed before the higher-level tasks are done.
- QUOTE: The current state of the art in linguistic annotation also divides the different annotation tasks into different levels, which can be arranged into a similar set of layers as shown in Figure 2.2. However, there is only an approximate correspondence between the levels of the tasks performed in practical corpus annotation work and the levels of description in linguistic theory.