Overview

The PPLRE Automated Evaluation System is the system used within the PPLRE Project to evaluate the Performance of the PPLRE Relation Extraction Algorithms, particularly with respect to Correctness Metrics of Precision and F-Score. The task allows us to decide which predicted OPL relations are the most likely to be accurate and therefore the best candidates to ask the Domain Experts to review during the PPLRE Manual Evaluation Task.

Algorithm Performance Evaluation

Current Performance Ranking

	Ensemble Zprsr&Snwbl 070322	Ensemble Any Two 070322	Zparser 0700404 (opt)	Zparser 070404	Snowball 070309	Snowball 070303 (opt)	NNeighbor 07040510 (opt)	NNeighbor 07040423	Cooccur 070403
TP	9	16	17	17	12	6	13	38	40
FP	0	1	10	14	6	12	8	152	243
FN	56	49	48	48	53	59	52	85	25
TN			138	134	144	145	140	27	80
Precision	100.0%	94.1%	63.0%	54.8%	66.6%	33.3%	61.9%	20.0%	14.1%
Recall	13.8%	24.6%	26.2%	26.2%	18.2%	9.2%	20.0%	58.5%	61.5%
Fscore	24.3%	39.0%	37.0%	35.4%	28.6%	14.4%	30.2%	29.8%	23.0%

PPLRE Evaluation - ZParser

PPLRE Evaluation - ZParser

PPLRE Evaluation - Nearest Neighbor

PPLRE Evaluation - Nearest Neighbor

PPLRE Evaluation - Snowball

PPLRE Evaluation - Snowball

PPLRE Evaluation - Cooccurrence

PPLRE Evaluation - Cooccurrence

PPLRE Evaluation - Ensemble

PPLRE Evaluation - Ensemble.

Requirements

One of the more ideal measures of Performance during this phase is the amount of time that a Domain Expert might spend validating a fixed sed of OPL relations. For example, the time required to validate 100 predicted relations divided by the number of True Positive predictions achieved. For now though we will assume that every good predictiong and bad prediction take as long to evaluate. Given this assumption we can focus on the precision of the algorithms.
We will also evaluate the F-Score in order to keep track of each algorithm's ability to extract a majority of the relations present in the test set.

Requirements: Input Data

Currently the focus of evaluation is on PPLRE Curated Data v1.3.

Requirements: Output Data

True Positive: This predicted relation in fact exists in the test corpus (in this document (in this passage))
False Positive: This predicted relation does NOT exist in the test corpus (in this document (in this passage))
False Negative: This relation in the test corpus (in this document (in this passage)) was NOT predicted.
True Negative: This relation was neither predicted nor is it in the test corpus.

Algorithm Output File Format

The file format is described by way of example. Below is a sample of the output data for the OP() relation. For the PL() relation the ORGANISM and PROTEIN columns would be replaced by the PROTEIN and LOCATION columns.

P.ORGANISISM	P.PROTEIN	P.CONFIDENCE	A.TUPLE_ID	A.PSID	A.SENTENCE_ID	A.ORGANISM	A.PROTEIN	OUTCOME	OUTCOME.Partial
Pseudomonas aeruginosa	OprD2	-78.65165722	&nbsp	3761	0	Pseudomonas aeruginosa	OprD2	TP	TP
Escherichia coli	cytochrome c	-84.08272563	&nbsp	11341	6	Escherichia coli	"cytochrome, cytochrome c “	TP	TP
Halobacterium saccharovorum	atpase	-98.48214753	&nbsp	311	0	Halobacterium saccharovorum	ATPase	TP	TP
Escherichia coli	Tsh	-100.032466	&nbsp	3131	0	Escherichia coli	Tsh	TP	TP
Neisseria gonorrhoeae	Fe-regulated protein	-100.7413244	&nbsp	10331	0	Neisseria gonorrhoeae	"Fe_Regulated protein, FrpB “	TP	TP
Legionella pneumophila	FlaA	-105.8240252	&nbsp	1551	0	Legionella pneumophila	FlaA	TP	TP
L. pneumophila	flaA	-130.5143263	&nbsp	1551	1	L. pneumophila	flaA	TP	TP
Pseudomonas aeruginosa	phospholipase C	-138.1919905	&nbsp	6581	0	Pseudomonas aeruginosa	"PLC, phospholipase C, lipase “	TP	TP
P. tunicata	AlpP	-148.7812059	&nbsp	3381	1	P. tunicata	AlpP	TP	TP
&nbsp	&nbsp	&nbsp	&nbsp	7361	5	L. pneumophila	PlaC	FN	TP
&nbsp	&nbsp	&nbsp	&nbsp	1341	3	Escherichia coli	crcA	FN	TP
&nbsp	&nbsp	&nbsp	&nbsp	6171	2	S. marcescens	HasD	FN	TP
&nbsp	&nbsp	&nbsp	&nbsp	6061	6	T. pallidum	47-kDa lipoprotein	FN	FN
&nbsp	&nbsp	&nbsp	&nbsp	491	2	Escherichia coli	"NADH oxidase, ExbD, ExbB “	FN	TP
Neisseria gonorrhoeae	OprD2	-100.7413244	&nbsp	123	1	&nbsp	&nbsp	FP	FP
Pseudomonas aeruginosa	flaB	-114.2801257	&nbsp	321	1	&nbsp	&nbsp	FP	FP

Design

Binary Evaluation

cd /home/zshi1/pplre/bin/re_parsing/semi/cotrain1/ ./binary_eva.pl <test_output> <gold_answer> <threshold> <eva_method#>

test_output: binary tuples predicted by the system. The file name should contain ‘PL’ or ‘PO’. It’s tab-delimited format with columns defined as follows:
- 0: tuple_id
- 1: protein name
- 2: PSID of the protein name
- 3: sentence id of the protein name
- 4: location/organism name
- 5: PSID of loc/org name
- 6: sentence id of loc/org name
- 7: confidence score

gold answer: ./data/curated_data/v1.3/OPL.test.tab
threshold: if confidence score > threshold -> positive, otherwise negative
eva_method# (method 1 is what we have agreed on)
- 0: Partial name, partial relationship matching
- 1: Partial name, full relationship matching
- 2: Full name, partial relationship matching
- 3: Full name, Full relationship matching

Output: stdout in tab-delimited format, same as results I reported in the meeting

Ternary Evaluation

% cd /home/zshi1/pplre/bin/re_parsing/semi/cotrain1/ % . ternary_eva.pl <PL_output> <PO_output> <gold_answer> <threshold> <eva_method#>

PL_output & PO_output: binary predictions of PL and PO. Same format as test_output above.
The rest parameters are same as for the binary prediction.
Output:

Wishlist Requirements

tbd

PPLRE Automated Evaluation System