1998 OntBasedExtrAndStructOfInforFromDataRichUnstrDocs

(Embley & al 1998) ⇒ David W. Embley, Douglas M. Campbell, Randy D. Smith, and Stephen W. Liddle. (1998). “Ontology-based Extraction and Structuring of Information from Data-Rich Unstructured Documents.” In: Proceedings of the seventh International Conference on Information and Knowledge Management (CIKM 1998) doi:10.1145/288627.288641

Subject Headings:

Notes

Cited By

Quotes

Abstract

We present a new approach to extracting information from unstructured documents based on an application ontology that describes a domain of interest. Starting with such an ontology, we formulate rules to extract constants and context keywords from unstructured documents. For each unstructured document of interest, we extract its constants and keywords and apply a recognizer to organize extracted constants as attribute values of tuples in a generated database schema. To make our approach general, we x all the processes and change only the ontological description for a different application domain. In experiments we conducted on two different types of unstructured documents taken from the Web, our approach attained recall ratios in the 80% and 90% range and precision ratios near 98%.

References

,

	Author	volume	Date Value	title	type	journal	titleUrl	doi	note	year
1998 OntBasedExtrAndStructOfInforFromDataRichUnstrDocs	David W. Embley Douglas M. Campbell Randy D. Smith Stephen W. Liddle			Ontology-based Extraction and Structuring of Information from Data-Rich Unstructured Documents		Proceedings of the seventh International Conference on Information and Knowledge Management	http://www.cs.sunysb.edu/~cse671/files/19.pdf	10.1145/288627.288641		1998