PPLRE RR Algorithm - Snowball
Jump to navigation
Jump to search
This page describes the configuration of the Snowball algorithm for its application to the PPLRE Evaluation task.
= Notes
- For information on the evaluation of Snowball on the task please See: PPLRE Evaluation - Snowball.
- For details of Snowball See: PPLRE Snowball Description.
- A good system to compare against because it is a well known information extraction system, and the code is available.
- Originally tested against the OrganizationHeadquarterLocation(ORGANIZATION,HEADQUARTER,LOCATION) relation.
- Some of the challenges of applying Snowball to the PPLRE Task include:
- 1. The support of only a Binary Relation.
- 2. The restriction that relations be expressed within a single sentence. [[See PPLRE Research Topics)
- 3. The assumption that the relationship is one to one. E.g. headquarter(<Organization>) => single <Location>. In our task a proteins can be localized in more than one region. The algorithm may be extensible to non-unique mappings if the seed examples were complete. If not complete then the algorithm is designed to penalize patterns that identify the missing mappings.
Overview
To apply Snowball to the task the following approach was taken:
- The system has three phases
- first iteration apply the seeds
- next iterate
- finally the patterns are applied to the test set.
These configuration decisions are implemented via the configuration file.
Sample Run
% ssh buster
% cd ~homira/scripts
% echo "Running PO"
% ./runall.sh 1 4
% echo "Running PL"
% ./runall.sh 2 4
Sample Run Detail
- the snowball.ini for the relation is copied over to ../snowball/conf/snowball.ini
- the randomseed.pl script is run with two parameters: 1) the relation type and 2) the number of seed.
- Snowball is run by executing the ./snowball/runme.sh script.
- some of the input/output files (extractedTuples.xml, seedTuples.xml, processedDocIds.xml, hqPatterns.xml) are copied to the run directoryjj
- the ./parsingResults.pl script is run with five parameters: 1) the relation type, 2) the run directory, 3) "../data/test/", 4) the number of seeds, and 5) the run Id.
Snowball Configuration
InitialTrain
SecondaryTrain
= History
- Homira Mojtabaei has been the systems analyst/programmer on the application of Snowball to PPLRE since September 2006.
= Appendix - Sample File Contents
- test file with a single sentence.