Annotation Process Model

From GM-RKB
Jump to navigation Jump to search

A Annotation Process Model is a data processing process model that represents an annotation process (following a systematic pattern for adding annotations to data items).



References

2024

  • (Tan et al., 2024) ⇒ Zhen Tan, Alimohammad Beigi, Song Wang, Ruocheng Guo, Amrita Bhattacharjee, Bohan Jiang, Mansooreh Karami, Jundong Li, Lu Cheng, and Huan Liu. (2024). "Large Language Models for Data Annotation: A Survey." arXiv preprint arXiv:2402.13446. [arXiv](https://arxiv.org/abs/2402.13446).
    • NOTES: This survey explores the use of large language models (LLMs) for data annotation, discussing various techniques like zero-shot and few-shot prompts, and highlighting the potential for LLMs to generate high-quality, context-sensitive annotations.
    • QUOTE: “Data annotation generally refers to the labeling or generating of raw data with relevant information, which could be used for improving the efficacy of machine learning models. The process, however, is labor-intensive and costly. The emergence of advanced Large Language Models (LLMs), exemplified by GPT-4, presents an unprecedented opportunity to automate the complicated process of data annotation. While existing surveys have extensively covered LLM architecture, training, and general applications, we uniquely focus on their specific utility for data annotation. This survey contributes to three core aspects: LLM-Based Annotation Generation, LLM-Generated Annotations Assessment, and LLM-Generated Annotations Utilization. Furthermore, this survey includes an in-depth taxonomy of data types that LLMs can annotate, a comprehensive review of learning strategies for models utilizing LLM-generated annotations, and a detailed discussion of the primary challenges and limitations associated with using LLMs for data annotation. Serving as a key guide, this survey aims to assist researchers and practitioners in exploring the potential of the latest LLMs for data annotation, thereby fostering future advancements in this critical field.”

2021

2021

  • (Thieu et al., 2021) ⇒ Thanh Thieu, Jonathan Camacho Maldonado, Pei-Shu Ho, Min Ding, Alex Marr, Diane Brandt, Denis Newman-Griffis, Ayah Zirikly, Leighton Chan, and Elizabeth Rasch. (2021). "A Comprehensive Study of Mobility Functioning Information in Clinical Notes: Entity Hierarchy, Corpus Annotation, and Sequence Labeling." International Journal of Medical Informatics, 147, 104351.
    • QUOTE: “The study presents a detailed analysis of mobility functioning information within clinical notes, establishing an entity hierarchy, conducting comprehensive corpus annotation, and employing sequence labeling to improve the extraction of relevant information. This work underscores the critical role of systematic annotation in enhancing the accuracy and utility of clinical data analysis.”
    • NOTES: This study investigates mobility functioning information in clinical notes, focusing on the development of an entity hierarchy, corpus annotation, and sequence labeling techniques. The study highlights the importance of structured annotation processes for extracting meaningful insights from clinical text data.
    • NOTES: Functioning terminology is underpopulated in electronic health records and underrepresented in the Unified Medical Language System.
    • NOTES: This is a comprehensive analysis of the Mobility domain of the ICF, including entity analysis, annotation, and machine sequence labeling.
    • NOTES: Low-resourced and nested Mobility concepts can be accurately identified by using transfer learning re-trained on an adequate corpus.
    • NOTES: The International Classification of Functioning, Disability and Health (ICF) is considered the international standard for describing and coding function and health states.
    • NOTES: Inter-annotator agreement (IAA) averaged 92.3% F1-score on mention text spans, and 96.6% Cohen’s kappa on attributes assignments, showing high reliability of the annotation process.

2018

2016