Manual Annotation Task
A Manual Annotation Task is an annotation task that is done by a human annotator.
- AKA: Human-based Annotation.
- Context:
- It can (often) be guided by an Annotation Guideline.
- It can produce a Manually Annotated Artifact, such as a manually annotated document or a manually annotated dataset.
- It can range from being a Single Person Manual Annotation Task to being a Crowdsourced Annotation Task.
- It can range from being a Manual Labeling Task to being a Manual Data Curation Task.
- It can be involved Artifact Labeling, Artifact Curation, and Metadata Tagging.
- It can be used in Artifact Preparation.
- ...
- Example(s):
- Physical Manual Annotation Tasks, such as:
- a Manual Museum Artifact Annotation Task where museum curators label artifacts in a museum with historical and contextual information.
- a Manual Archaeological Annotation Task where archaeologists tag items in an archaeological dig with metadata about their origin and significance.
- a Manual Biological Specimen Annotation Task where lab technicians annotate biological specimens with taxonomic and collection data.
- a Manual Geological Sample Annotation Task where geologists mark geological samples with information on their composition and location of discovery.
- ...
- Digital Manual Annotation Tasks, such as:
- a Manual Document Annotation Task where data analysts label documents with topics, keywords, or other relevant metadata.
- a Manual Image Annotation Task where image analysts label objects within images for use in Computer Vision datasets.
- a Manual Video Annotation Task where video analysts annotate videos for events, activities, or object tracking.
- a Manual Audio Annotation Task where audio transcribers transcribe speech or annotate audio for sound events.
- a Manual Text Annotation Task that involves marking parts of speech in sentences for linguistic analysis.
- a Manual Chatbot Answer Annotation Task, ...
- ...
- Physical Manual Annotation Tasks, such as:
- Counter-Example(s):
- Automated Annotation Tasks, which uses algorithms and machine learning to label data without human intervention.
- Data Collection Tasks, which involve gathering raw data rather than labeling it.
- See: Human Annotation, Data Preparation, Text Annotation Task, Image Annotation Task, Transcription Task, Linguistic Data Consortium.
References
2014
- (Sabou et al., 2014) ⇒ Marta Sabou, Kalina Bontcheva, Leon Derczynski, and Arno Scharl. (2014). “Corpus Annotation through Crowdsourcing: Towards Best Practice Guidelines.” In: Proc. LREC.
- QUOTE: Crowdsourcing is an emerging collaborative approach that can be used for the acquisition of annotated corpora and a wide range of other linguistic resources. Although the use of this approach is intensifying in all its key genres (paid-for crowdsourcing, games with a purpose, volunteering-based approaches), the community still lacks a set of best-practice guidelines similar to the annotation best practices for traditional, expert-based corpus acquisition. In this paper we focus on the use of crowdsourcing methods for corpus acquisition and propose a set of best practice guidelines based in our own experiences in this area and an overview of related literature. We also introduce GATE Crowd, a plugin of the GATE platform that relies on these guidelines and offers tool support for using crowdsourcing in a more principled and efficient manner.
Over the past ten years, Natural Language Processing (NLP) research has been driven forward by a growing volume of annotated corpora, produced by evaluation initiatives such as ACE (ACE, 2004), TAC,[1] SemEval and Senseval, [2] and large annotation projects such as OntoNotes (Hovy et al., 2006). These corpora have been essential for training and domain adaptation of NLP algorithms and their quantitative evaluation, as well as for enabling algorithm comparison and repeatable experimentation. Thanks to these efforts, there are now well-understood best practices in how to create annotations of consistently high quality, by employing, training, and managing groups of linguistic and/or domain experts. This process is referred to as “the science of annotation” (Hovy, 2010).
More recently, the emergence of crowdsourcing platforms (e.g. paid-for marketplaces such as Amazon Mechanical Turk (AMT) and CrowdFlower (CF); games with a purpose; and volunteer-based platforms such as crowdcrafting), coupled with growth in internet connectivity, motivated NLP researchers to experiment with crowdsourcing as a novel, collaborative approach for obtaining linguistically annotated corpora. The advantages of crowdsourcing over expert-based annotation have already been discussed elsewhere (Fort et al., 2011; Wang et al., 2012), but in a nutshell, crowdsourcing tends to be cheaper and faster. ...
- QUOTE: Crowdsourcing is an emerging collaborative approach that can be used for the acquisition of annotated corpora and a wide range of other linguistic resources. Although the use of this approach is intensifying in all its key genres (paid-for crowdsourcing, games with a purpose, volunteering-based approaches), the community still lacks a set of best-practice guidelines similar to the annotation best practices for traditional, expert-based corpus acquisition. In this paper we focus on the use of crowdsourcing methods for corpus acquisition and propose a set of best practice guidelines based in our own experiences in this area and an overview of related literature. We also introduce GATE Crowd, a plugin of the GATE platform that relies on these guidelines and offers tool support for using crowdsourcing in a more principled and efficient manner.
2009
- (Kulkarni et al., 2009) ⇒ Sayali Kulkarni, Amit Singh, Ganesh Ramakrishnan, Soumen Chakrabarti. (2009). “Collective Annotation of Wikipedia Entities in Web Text.” In: Proceedings of ACM SIGKDD Conference (KDD-2009). doi:10.1145/1557019.1557073.
- In experiments involving over a hundred manually-annotated Web pages and tens of thousands of spots, our approaches significantly outperform recently-proposed algorithms.
- QUOTE: In experiments involving over a hundred manually-annotated Web pages and tens of thousands of spots, our approaches significantly outperform recently-proposed algorithms. ...