Weakly-Supervised Information Extraction Task

References

(Banko, 2009) ⇒ Michele Banko. (2009). “Open Information Extraction for the Web." PhD Thesis, University of Washington.
- [[Brin [11]], [[Agichtein and Gravano [1]], [[Riloff and Jones [70]], [[Pasca et al. [62]], and [[Bunescu and Mooney [12]] sought to reduce the amount of manual labor necessary to perform relation-specific extraction. Rather than demand hand-tagged corpora, these weakly-supervised IE systems required a user to specify relation-specific knowledge in the form of a small set of seed instances known to satisfy the relation of interest. For instance, by specifying the pairs (Microsoft, Redmond), (Exxon, Irving) and (Intel, Santa Clara) these IE systems learned patterns (e.g. <X> ’s headquarters in <Y> and <Y> -based <X> ) that identified additional pairs of company names and locations satisfying the Headquarters(X, Y ) relation. While these systems reduced the amount of required labeled inputs by a significant amount, and can achieve levels of precision and recall on par with fully-supervised systems, the remaining amount of labeling effort becomes non-trivial when the goal is to extract instances of thousands of relations.