PPLRE Research Question
(Redirected from PPLRE Research Topics)
Jump to navigation
Jump to search
A PPLRE Research Question is an applied NLP research questions that have been recognized through the PPLRE Project.
- Context:
- It can (often) involve discovery of ways to improve Accuracy.
- Such as by improving Recall by, for example, addressing scenarios were current Relation Recognition Algorithms currently fail to predict any response (False Negative). Examples include Multi-Sentence Relation Recognition and Implicit Named Entitys.
- Such as by improving Precision by, for example, addressing scenarios were current Relation Recognition Algorithms currently fail to make a correct prediction (False Positive). Examples include sentences that include more than one relation and sentences that include a misclassified named entity.
- …
- It can (often) involve discovery of ways to improve Accuracy.
- Counter-Example(s):
- See: Relation Recognition Task.
Examples
1) PPLRE Research Topics - Document-based Analysis
- Synopsis: Most current state or the art Relation Recognition Algorithms discover Semantic Relations by treating the Corpus as a Bag of Sentences (i.e. perform Sentence-level Analysis).
- Precision could be improved by jointly analysing all of the sentences in a document (aka Discourse-level Analysis).
- Possible approaches include: 1) The identification of the same relation expressed in another part of the document. If a relation is repeated in a document then it is more likely to be correct and would reinforce the confidence in a relation candidates. 2) Similarly the identification of a "conflicting relation" elsewhere in the document would may diminish our confidence in the prediction.
2) PPLRE Research Topics - Relations across Multiple Sentence
- Synopsis: Most current state or the art Relation Recognition Algorithms only discover Semantic Relations that are contained within a single sentence. Recall performance could be improved by identifying relations that are expressed across multiple sentences. For example, in a biomedical document an organism is often identified early in the document and no longer explicitly restated in latter sentences that mention one of its proteins.
- Possible approaches include: the addition of Anaphora Resolution and Coreference Resolution on named entities, building a Text Graph that joins on these entities and then performing search on the graph. Note that the spread of relations into multiple sentences is also more likely to occur as relations involve more than two entities (see Ternary Relations below).
3) PPLRE Research Topics - Ternary Relations
- Synopsis: Recent research has focused on unary relations (NER) such as Composer(C) and on binary relations such as OrgHeadquarterLocation(O,L). The PPLRE task however is a ternary relation OPL(Organism, Protein, Location). Furthermore, N-ary Relations however are commonplace, for example Event Relations typically unite two Concepts with a Temporal Relation. While it is possible to divide an n-ary relation into two binary relations, a unified approach would have access to more information. Possibly approaches include the casting of the document into a Text Graph and then the identification of ternary patterns.
4) PPLRE Research Topics - Sentences with Many Relations
- Synopsis: Past Relation Recognition Algorithms have been applied to mainly to tasks where the Sentences contain at most one instance of the sought relation and few if any extraneous entities to confound the pattern search. A sentence with a Company/Headquarter proposition typically will not mention more than one such relation, nor mention other companies or locations in the sentence. There is an opportunity to improve performance both in terms of recall and precision in domains, such as PPLRE, whose corpus is summarized information and with writing from a technical domain. One idea is to build a model that can predict whether two entities would share in all relations stated in the sentence.
5) PPLRE Research Topics - Many-to-many Relations
- Synopsis: Some Relation Recognition Algorithms, particularly those that Bootstrap are optimized for One-to-Many Relation. In the Organization/Location relation for example, Snowball exploits the fact that Relation Recognition Patterns that associate a company to a different city must be rejected. The PPLRE task however involves a Many-to-Many Relation: A Protein can be located in many Cellular Comparments, and vice versa. The “LasA protease” protein of “P. aeruginosa”, for example, can be located in both the “cytoplasm” and the “outer membrane”. Ideas include … <tbd>.
6) PPLRE Research Topics - Long-distance Sentence Patterns
- Synopsis: Current research assumes that a relationship statements does not involve many intervening words between the entities. The PPLRE Task however involves documents with long sentences. An idea is to divide the sentence into chunks that can be disposed. Sources of information include Semantic Role Labeling and Discourse Relations.
Miscellaneous
- Positive Sentence Harvesting: One innate difficulty is to gather the correct Sentences during the Positive Sentence Harverting phase. For the IsSiblingTo() Semantic Relation, for example, the Training Cases could include the pair (John Smith,Jane Smith). This pair will match many sentences that are gathered which refer to the wrong Concept Instance. E.g. John Smith and Jane Smith may be in a IsParentTo() relation or may simply be acquaintences with the same last name... However, this problem is likely not a significant one in that it is likely that most of training cases will uniquely identify a sentence with a correct relation. For the pair (Alan Turing,John Turing) for example, only correct sentences would be retrieved (if any).
- PPLRE Research Topics - Implicit-entity Resolution
- Synopsis: Most Semantic Relation Algorithms require that all of the relation's Named Entitys be explicitly identified in advance. Often one or more of the entities is only implicitly stated. In the PPLRE Task for example the Cellular Location is often expressed in terms of some action, such as the verb secreted representing an extracellular location. One way to address this challenge is to build a model that predicts keywords that imply the entity.
- PPLRE Research Topics - Named Entity Relabeling
- Synopsis: One of the challenges in the PPLRE domain arises from the weak accuracy of Named Entity Recognition of Proteins. Many proteins are not labeled and many are mislabeled. An idea of how relieve this problem is by using the discovered Relation Recognition Patterns to discover mislabeled entities. A high-precision pattern could be used to correct a mistake by the NER procedure. The improved data could then be used to train a new model.