2011 DomainSpecEntityExtractFromNoisy
- (Bratus et al., 2011) ⇒ Sergey Bratus, Anna Rumshisky, Alexy Khrabrov, Rajenda Magar, and Paul Thompson. (2011). “Domain-specific Entity Extraction from Noisy, Unstructured Data Using Ontology-guided Search.” In: International Journal on Document Analysis and Recognition. doi:10.1007/s10032-011-0149-5O
Subject Headings: Text analysis; Language models; Information extraction; Ontology-guided Search.
Notes
Cited By
~ 23 http://scholar.google.com/scholar?cluster=16212946850859852842
Quotes
Author Keywords
Text Analysis; Language Models; Information Extraction; Ontology-Guided Search
Abstract
Domain-specific knowledge is often recorded by experts in the form of unstructured text. For example, in the medical domain, clinical notes from electronic health records contain a wealth of information. Similar practices are found in other domains. The challenge we discuss in this paper is how to identify and extract part names from technicians repair notes, a noisy unstructured text data source from General Motors’ archives of solved vehicle repair problems, with the goal to develop a robust and dynamic reasoning system to be used as a repair adviser by service technicians. In the present work, we discuss two approaches to this problem. We present an algorithm for ontology-guided entity disambiguation that uses existing knowledge sources, such as domain-specific taxonomies and other structured data. We illustrate its use in the automotive domain, using GM parts ontology and the unit structure of repair manuals text to build context models, which are then used to disambiguate mentions of part-related entities in the text. We also describe extraction of part names with a small amount of annotated data using hidden Markov models (HMM) with shrinkage, achieving an f-score of approximately 80%. Next, we used linear-chain conditional random fields (CRF) in order to model observation dependencies present in the repair notes. Using CRF did not lead to improved performance, but a slight improvement over the HMM results was obtained by using a weighted combination of the HMM and CRF models.
References
Author | volume | Date Value | title | type | journal | titleUrl | doi | note | year | |
---|---|---|---|---|---|---|---|---|---|---|
2011 DomainSpecEntityExtractFromNoisy | Sergey Bratus Anna Rumshisky Alexy Khrabrov Rajenda Magar Paul Thompson | Domain-specific Entity Extraction from Noisy, Unstructured Data Using Ontology-guided Search | International Journal on Document Analysis and Recognition | http://pages.cs.brandeis.edu/~arum/publications/ijdar2010.pdf | 10.1007/s10032-011-0149-5 | 2011 |