2015 DeepDiveADataManagementSystemfo
- (Zhang, 2015) ⇒ Ce Zhang. (2015). “DeepDive: A Data Management System for Automatic Knowledge Base Construction.” In: PhD thesis Journal.
Subject Headings: DeepDive System, Knowledge Base Construction.
Notes
Cited By
Quotes
Abstract
Many pressing questions in science are macroscopic: they require scientists to consult information expressed in a wide range of resources, many of which are not organized in a structured relational form. Knowledge base construction (KBC) is the process of populating a knowledge base, i.e., a relational database storing factual information, from unstructured inputs. KBC holds the promise of facilitating a range of macroscopic sciences by making information accessible to scientists.
One key challenge in building a high-quality KBC system is that developers must often deal with data that are both diverse in type and large in size. Further complicating the scenario is that these data need to be manipulated by both relational operations and state-of-the-art machine-learning techniques. This dissertation focuses on supporting this complex process of building KBC systems. DeepDive is a data management system that we built to study this problem; its ultimate goal is to allow scientists to build a KBC system by declaratively specifying domain knowledge without worrying about any algorithmic, performance, or scalability issues.
DeepDive was built by generalizing from our experience in building more than ten high-quality KBC systems, many of which exceed human quality or are top-performing systems in KBC competitions, and many of which were built completely by scientists or industry users using DeepDive. From these examples, we designed a declarative language to specify a KBC system and a concrete protocol that iteratively improves the quality of KBC systems. This flexible framework introduces challenges of scalability and performance–Many KBC systems built with DeepDive contain statistical inference and learning tasks over terabytes of data, and the iterative protocol also requires executing similar inference problems multiple times. Motivated by these challenges, we designed techniques that make both the batch execution and incremental execution of a KBC program up to two orders of magnitude more efficient and/or scalable. This dissertation describes the DeepDive framework, its applications, and these techniques, to demonstrate the thesis that it is feasible to build an efficient and scalable data management system for the end-to-end workflow of building KBC systems.
References
;
Author | volume | Date Value | title | type | journal | titleUrl | doi | note | year | |
---|---|---|---|---|---|---|---|---|---|---|
2015 DeepDiveADataManagementSystemfo | Ce Zhang | DeepDive: A Data Management System for Automatic Knowledge Base Construction |