2006 TheDifficultiesOfTexNamExtr
Jump to navigation
Jump to search
- (Sautter & Böhm, 2006) ⇒ Guido Sautter, Klemens Böhm. (2006). “The Difficulties of Taxonomic Name Extraction and a Solution.” In: Proceedings of Workshop on BIONLP (BioNLP 2006).
Subject Headings: Organism NER.
Notes
- It suggests that even identifying the Organism Name (Taxonomic Name) in a document is difficult
Quotes
Abstract
In modern biology, digitization of biosystematics publications is an important task. Extraction of taxonomic names from such documents is one of its major issues. This is because these names identify the various genera and species. This article reports on our experiences with learning techniques for this particular task. We say why established Named-Entity Recognition techniques are somewhat difficult to use in our context. One reason is that we have only very little training data available. Our experiments show that a combining approach that relies on regular expressions, heuristics, and word-level language recognition achieves very high precision and recall and allows to cope with those difficulties.,
Author | volume | Date Value | title | type | journal | titleUrl | doi | note | year | |
---|---|---|---|---|---|---|---|---|---|---|
2006 TheDifficultiesOfTexNamExtr | Guido Sautter Klemens Böhm | The Difficulties of Taxonomic Name Extraction and a Solution | http://acl.ldc.upenn.edu/W/W06/W06-3325.pdf |