Annotation Process Model
Jump to navigation
Jump to search
A Annotation Process Model is a process model that represents an annotation process.
- Context:
- It can (often) be created by an Annotation Process Modeling Task.
- It can define roles and responsibilities in an annotation project.
- It can outline the tools and techniques used for creating annotations, such as annotation tools and annotation guidelines.
- It can range from being a Descriptive Annotation Process Model to being a Prescriptive Annotation Process Model to being an Explanatory Annotation Process Model.
- It can guide the steps and workflows involved in the Annotation Process.
- ...
- Example(s):
- Counter-Example(s):
- ...
- See: Organizational Workflow, Model Instance, Process Redesign.
References
2024
- (Tan et al., 2024) ⇒ Zhen Tan, Alimohammad Beigi, Song Wang, Ruocheng Guo, Amrita Bhattacharjee, Bohan Jiang, Mansooreh Karami, Jundong Li, Lu Cheng, and Huan Liu. (2024). "Large Language Models for Data Annotation: A Survey." arXiv preprint arXiv:2402.13446. [arXiv](https://arxiv.org/abs/2402.13446).
- NOTES: This survey explores the use of large language models (LLMs) for data annotation, discussing various techniques like zero-shot and few-shot prompts, and highlighting the potential for LLMs to generate high-quality, context-sensitive annotations.
- QUOTE: “Data annotation generally refers to the labeling or generating of raw data with relevant information, which could be used for improving the efficacy of machine learning models. The process, however, is labor-intensive and costly. The emergence of advanced Large Language Models (LLMs), exemplified by GPT-4, presents an unprecedented opportunity to automate the complicated process of data annotation. While existing surveys have extensively covered LLM architecture, training, and general applications, we uniquely focus on their specific utility for data annotation. This survey contributes to three core aspects: LLM-Based Annotation Generation, LLM-Generated Annotations Assessment, and LLM-Generated Annotations Utilization. Furthermore, this survey includes an in-depth taxonomy of data types that LLMs can annotate, a comprehensive review of learning strategies for models utilizing LLM-generated annotations, and a detailed discussion of the primary challenges and limitations associated with using LLMs for data annotation. Serving as a key guide, this survey aims to assist researchers and practitioners in exploring the potential of the latest LLMs for data annotation, thereby fostering future advancements in this critical field.”
2021
- (Thieu et al., 2021) ⇒ Thanh Thieu, Jonathan Camacho Maldonado, Pei-Shu Ho, Min Ding, Alex Marr, Diane Brandt, Denis Newman-Griffis, Ayah Zirikly, Leighton Chan, and Elizabeth Rasch. (2021). "A Comprehensive Study of Mobility Functioning Information in Clinical Notes: Entity Hierarchy, Corpus Annotation, and Sequence Labeling." International Journal of Medical Informatics, 147, 104351.
2021
- (Thieu et al., 2021) ⇒ Thanh Thieu, Jonathan Camacho Maldonado, Pei-Shu Ho, Min Ding, Alex Marr, Diane Brandt, Denis Newman-Griffis, Ayah Zirikly, Leighton Chan, and Elizabeth Rasch. (2021). "A Comprehensive Study of Mobility Functioning Information in Clinical Notes: Entity Hierarchy, Corpus Annotation, and Sequence Labeling." International Journal of Medical Informatics, 147, 104351.
- QUOTE: “The study presents a detailed analysis of mobility functioning information within clinical notes, establishing an entity hierarchy, conducting comprehensive corpus annotation, and employing sequence labeling to improve the extraction of relevant information. This work underscores the critical role of systematic annotation in enhancing the accuracy and utility of clinical data analysis.”
- NOTES: This study investigates mobility functioning information in clinical notes, focusing on the development of an entity hierarchy, corpus annotation, and sequence labeling techniques. The study highlights the importance of structured annotation processes for extracting meaningful insights from clinical text data.
- NOTES: Functioning terminology is underpopulated in electronic health records and underrepresented in the Unified Medical Language System.
- NOTES: This is a comprehensive analysis of the Mobility domain of the ICF, including entity analysis, annotation, and machine sequence labeling.
- NOTES: Low-resourced and nested Mobility concepts can be accurately identified by using transfer learning re-trained on an adequate corpus.
- NOTES: The International Classification of Functioning, Disability and Health (ICF) is considered the international standard for describing and coding function and health states.
- NOTES: Inter-annotator agreement (IAA) averaged 92.3% F1-score on mention text spans, and 96.6% Cohen’s kappa on attributes assignments, showing high reliability of the annotation process.
2018
- (Gururangan et al., 2018) ⇒ Suchin Gururangan, Swabha Swayamdipta, Omer Levy, Roy Schwartz, Samuel Bowman, and Noah A. Smith. (2018). "Annotation Artifacts in Natural Language Inference Data." In: *Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers)*. Association for Computational Linguistics, New Orleans, Louisiana. [ACL Anthology](https://aclanthology.org/N18-2017).
- NOTES: This study highlights the presence of annotation artifacts in natural language inference datasets, which can sometimes make it possible to predict labels by looking only at the hypothesis. This suggests the need for improved annotation protocols to avoid overestimating model performance.
2016
- (Finlayson & Erjavec, 2016) ⇒ Mark A. Finlayson, and Tomaž Erjavec. (2016). "Overview of Annotation Creation: Processes & Tools." In: James Pustejovsky and Nancy Ide (eds.) *Handbook of Linguistic Annotation*. New York: Springer. [arXiv:1602.05753](https://doi.org/10.48550/arXiv.1602.05753).
- NOTES: This chapter outlines the complex endeavor of creating linguistic annotations, detailing necessary capabilities and common problems with annotation tools. It emphasizes the importance of tool support for high-quality, reusable annotations.