PPLRE Automated Evaluation System
A PPLRE Automated Evaluation System is the Software System that addresses the PPLRE Automated Evaluation Task.
- See: PPLRE Project.
Overview
The PPLRE Automated Evaluation System is the system used within the PPLRE Project to evaluate the Performance of the PPLRE Relation Extraction Algorithms, particularly with respect to Correctness Metrics of Precision and F-Score. The task allows us to decide which predicted OPL relations are the most likely to be accurate and therefore the best candidates to ask the Domain Experts to review during the PPLRE Manual Evaluation Task.
Algorithm Performance Evaluation
Current Performance Ranking
Ensemble Zprsr&Snwbl 070322 | Ensemble Any Two 070322 | Zparser 0700404 (opt) | Zparser 070404 | Snowball 070309 | Snowball 070303 (opt) | NNeighbor 07040510 (opt) | NNeighbor 07040423 | Cooccur 070403 | |
TP | 9 | 16 | 17 | 17 | 12 | 6 | 13 | 38 | 40 |
FP | 0 | 1 | 10 | 14 | 6 | 12 | 8 | 152 | 243 |
FN | 56 | 49 | 48 | 48 | 53 | 59 | 52 | 85 | 25 |
TN | 138 | 134 | 144 | 145 | 140 | 27 | 80 | ||
Precision | 100.0% | 94.1% | 63.0% | 54.8% | 66.6% | 33.3% | 61.9% | 20.0% | 14.1% |
Recall | 13.8% | 24.6% | 26.2% | 26.2% | 18.2% | 9.2% | 20.0% | 58.5% | 61.5% |
Fscore | 24.3% | 39.0% | 37.0% | 35.4% | 28.6% | 14.4% | 30.2% | 29.8% | 23.0% |
PPLRE Evaluation - ZParser
PPLRE Evaluation - Nearest Neighbor
PPLRE Evaluation - Snowball
PPLRE Evaluation - Cooccurrence
PPLRE Evaluation - Ensemble
Requirements
One of the more ideal measures of Performance during this phase is the amount of time that a Domain Expert might spend validating a fixed sed of OPL relations. For example, the time required to validate 100 predicted relations divided by the number of True Positive predictions achieved. For now though we will assume that every good predictiong and bad prediction take as long to evaluate. Given this assumption we can focus on the precision of the algorithms.
We will also evaluate the F-Score in order to keep track of each algorithm's ability to extract a majority of the relations present in the test set.
Requirements: Input Data
Currently the focus of evaluation is on PPLRE Curated Data v1.3.
Requirements: Output Data
- True Positive: This predicted relation in fact exists in the test corpus (in this document (in this passage))
- False Positive: This predicted relation does NOT exist in the test corpus (in this document (in this passage))
- False Negative: This relation in the test corpus (in this document (in this passage)) was NOT predicted.
- True Negative: This relation was neither predicted nor is it in the test corpus.
Algorithm Output File Format
The file format is described by way of example. Below is a sample of the output data for the OP() relation. For the PL() relation the ORGANISM and PROTEIN columns would be replaced by the PROTEIN and LOCATION columns.
P.ORGANISISM | P.PROTEIN | P.CONFIDENCE | A.TUPLE_ID | A.PSID | A.SENTENCE_ID | A.ORGANISM | A.PROTEIN | OUTCOME | OUTCOME.Partial |
Pseudomonas aeruginosa | OprD2 | -78.65165722 |   | 3761 | 0 | Pseudomonas aeruginosa | OprD2 | TP | TP |
Escherichia coli | cytochrome c | -84.08272563 |   | 11341 | 6 | Escherichia coli | "cytochrome, cytochrome c “ | TP | TP |
Halobacterium saccharovorum | atpase | -98.48214753 |   | 311 | 0 | Halobacterium saccharovorum | ATPase | TP | TP |
Escherichia coli | Tsh | -100.032466 |   | 3131 | 0 | Escherichia coli | Tsh | TP | TP |
Neisseria gonorrhoeae | Fe-regulated protein | -100.7413244 |   | 10331 | 0 | Neisseria gonorrhoeae | "Fe_Regulated protein, FrpB “ | TP | TP |
Legionella pneumophila | FlaA | -105.8240252 |   | 1551 | 0 | Legionella pneumophila | FlaA | TP | TP |
L. pneumophila | flaA | -130.5143263 |   | 1551 | 1 | L. pneumophila | flaA | TP | TP |
Pseudomonas aeruginosa | phospholipase C | -138.1919905 |   | 6581 | 0 | Pseudomonas aeruginosa | "PLC, phospholipase C, lipase “ | TP | TP |
P. tunicata | AlpP | -148.7812059 |   | 3381 | 1 | P. tunicata | AlpP | TP | TP |
  |   |   |   | 7361 | 5 | L. pneumophila | PlaC | FN | TP |
  |   |   |   | 1341 | 3 | Escherichia coli | crcA | FN | TP |
  |   |   |   | 6171 | 2 | S. marcescens | HasD | FN | TP |
  |   |   |   | 6061 | 6 | T. pallidum | 47-kDa lipoprotein | FN | FN |
  |   |   |   | 491 | 2 | Escherichia coli | "NADH oxidase, ExbD, ExbB “ | FN | TP |
Neisseria gonorrhoeae | OprD2 | -100.7413244 |   | 123 | 1 |   |   | FP | FP |
Pseudomonas aeruginosa | flaB | -114.2801257 |   | 321 | 1 |   |   | FP | FP |
Design
Binary Evaluation
cd /home/zshi1/pplre/bin/re_parsing/semi/cotrain1/
./binary_eva.pl <test_output> <gold_answer> <threshold> <eva_method#>
- test_output: binary tuples predicted by the system. The file name should contain ‘PL’ or ‘PO’. It’s tab-delimited format with columns defined as follows:
- 0: tuple_id
- 1: protein name
- 2: PSID of the protein name
- 3: sentence id of the protein name
- 4: location/organism name
- 5: PSID of loc/org name
- 6: sentence id of loc/org name
- 7: confidence score
- gold answer: ./data/curated_data/v1.3/OPL.test.tab
- threshold: if confidence score > threshold -> positive, otherwise negative
- eva_method# (method 1 is what we have agreed on)
- 0: Partial name, partial relationship matching
- 1: Partial name, full relationship matching
- 2: Full name, partial relationship matching
- 3: Full name, Full relationship matching
- Output: stdout in tab-delimited format, same as results I reported in the meeting
Ternary Evaluation
% cd /home/zshi1/pplre/bin/re_parsing/semi/cotrain1/
% . ternary_eva.pl <PL_output> <PO_output> <gold_answer> <threshold> <eva_method#>
- PL_output & PO_output: binary predictions of PL and PO. Same format as test_output above.
- The rest parameters are same as for the binary prediction.
- Output:
Wishlist Requirements
- tbd