Positive Sentence Harvesting Task
A Positive Sentence Harvesting Task a Knowledge Harvesting Task that requires the identification of sentences that contain the sought Semantic Relation.
- Context:
- Input: Training Cases.
- Output: Sentences that match the training cases.
- They typically do this by assuming that any sentence that contains all instances of the relation refer to the relation.
- Challenges:
- 1) Poor Matches Challenge
A challenge with the Positive Sentence Harverting phase is that the Training Cases may lead to the identification of sentences that contain a different relation than the one sought. One reason for this outcome is that the entities in the training cases may be ambiguous. For the PEOPLE entity type, for example, the name John Smith will match many individuals. If IsSiblingTo() is the Semantic Relation sought then having a training case such as: (John Smith,Jane Smith) will match many sentences that refer to different people than the ones intended. John Smith and Jane Smith may instead be in a IsParentTo() relation, or may simply be acquaintences with the same last name... However, this problem is likely not a significant one in that it is likely that most of training cases will uniquely identify a sentence with a correct relation. For the pair (Alan Turing,John Turing) for example, only correct sentences would be retrieved (if any). … This problem could be alleviated by Multi-document Coreference Resolution.
- 2) No Matches Challenge
A challenge with the Positive Sentence Harverting phase is that the training cases given will not lead to the identification of any sentences. For the pair (Alan Turing,John Turing) for example, a Google search found no sentences with both names on them. … This problem could be alleviated by Discourse-level Analysis which may for example
- “Alan Mathison Turing was born on 23rd June 1912 to Julius Mathison Turing and Ethel Sara Turing. Alan was the second child of the couple; he had an elder brother named John. Alan Turing is now thought of as the godfather of modern computing science but he was unappreciated in his own lifetime particularly in the early years. He was born into the British upper middle class where science was looked down upon rather boys should grow up to become lawyers as John Turing did."
- 1) Poor Matches Challenge
- Example(s):
- If the pair (Alan Turing,John Turing) is a (IsSiblingTo()) training case for example, it will look for sentences that contain both names. For this example likely lead to few, if any, such sentences.
- If the pair (John Smith,Jane Smith) is a (IsSpouseTo()) training case for example, it will look for sentences that contain both names. For this example likely lead to many erroneous sentences.
- …
- Counter-Example(s):
- See: Relation Recognition, Snowball, LEILA.
References
2018
- (Chisholm, 2018) ⇒ Andrew Chisholm (2018). "Web Knowledge Bases". PhD Thesis. School of Information Technologies. University of Sydney
- QUOTE: kbs have been used to investigate the interaction between structured facts and unstructured text. Generating textual templates that are filled by structured data is a common approach and has been used for conversational text (Han et al., 2015) and biographical text generation (Duma and Klein, 2013). Wikipedia has also been a popular resource for studying biography, including sentence harvesting and ordering (Biadsy et al., 2008), unsupervised discovery of distinct sequences of life events (Bamman and Smith, 2014) and fact extraction from text (Garera and Yarowsky, 2009).