PPLRE Annotation Issues
Jump to navigation
Jump to search
- This page contains the PPLRE Annotation Issues for the PPLRE Corpus.
- See: PPLRE Annotator.
Open Annotation Issues
AI070301.1
- OpenedBy: Gabor Melli.
- Description: The word "Cytoplasmic" is sometimes not tagged to be a "LOCATION".
- Example(s):
- In PPLRE Passage 8611.0-1 (v2.3) the phrase "cytoplasmic regulator” is typed as a PROTEIN, but should have been labeled as a LOCATION.
- In PSID=10931 the PROTEIN is the word "Cytoplasmic” and the location "membrane"!? "Cytoplasmic membranes were isolated and examined from two spectinomycin-susceptible and three spectinomycin-resistant clinical strains of Neisseria gonorrhoeae ."
- In PSID=1821 the phrase cytoplasmic fractions is not tagged as a location. It is untagged. “Although TadA is not predicted to have a transmembrane domain, the protein was localized to the inner membrane and cytoplasmic fractions of A. actinomycetemcomitans cells, indicating a possible peripheral association with the inner membrane ."
- Plan: tbd. likely to be fixed in Annotator v2.6
AI070301.2
- OpenedBy: Gabor Melli.
- Description: In the NER_ID column the first local id has no integer associated with it. E.g. it will say "localID_" instead of "localID_0".
- Plan: tbd
AI070316.1
- OpenedBy: Gabor Melli.
- Description: Sometimes gene names are grouped into one word. Ideally we can find a way to split them up.
- Example: 3340.a "The V. vulnificus pilA gene is part of an operon and is clustered with three other pilus biogenesis genes, pilBCD .
- Plan: tbd
AI070316.2
- OpenedBy: Gabor Melli.
- Description: Sometimes location names are grouped into a phrase that occludes the named entity. Ideally we can find a way to split them up.
- Example: PPLRE Passage 1671.a "... inner and outer membranes were ...” ⇒” … inner membranes and outer membranes were ..."
- Plan: tbd
AI070318.1
- Opened By: Gabor Melli.
- Description: There are some simple synonym rules that could be implemented for PROTEINs. E.g. if "[PROTEIN p] ([PROTEIN p2])” then p1=p2.
- Example: in PPLRE Passage 6210.a.0 "The penicillin-binding protein (PBP ) patterns of six strains of Bilophila wadsworthia were investigated by sodium dodecyl sulfate-polyacrylamide gel electrophoresis analysis and subsequent fluorography of membrane preparations labelled with [3H ] benzylpenicillin .
AI070325.1
- Opened By: Gabor Melli.
- Description: In PSID=3761 sentence=0 the phrase "outer membrane" is not tagged as a LOCATION named entity (in v2.3 of the annotator).
AI070325.1
- Opened By: Gabor Melli.
- Description: In PSID=7373 sentence=? the word "tcpQ" is coreferenced with LocalID_1, except for one instance (token 186 sentence 7). (in v2.3 of the annotator).
AI070326.1
- Opened By: Gabor Melli.
- Description: PSID=6299 has no annotated.tab file. (in either v2.3 or v2.4 of the annotator). There are sentences in the sentences.txt file.
AI070326.2
- Opened By: Gabor Melli.
- Description: PSID=4990 has many errors in coreference resolution. The word "protainase" is assigned to many "LocalID_X"s. Note that one of the entries, the one with a capitalized "P" is a Dictionary match. May want to experiment with uncased matching.
AI070401.1
- Opened By: Gabor Melli.
- Description: PSID=7373 One of the LOCALIZATION entries has an ExtID with a Ctrl-M character.
- Suggestion: Update Annotator to remove leading & trailing spaces.
AI070403.1
- Opened By: Gabor Melli.
- Description: PSID=10331 sent=0 a critical protein is not tagged. The reason may be that it is the first word on the abstract: "FrpB".
- Suggestion: Because the word is tagged as a protein elsewhere in the document (twice if you count the gene frpB), it may possible to update the ConceptClassifier to make decisions based on the whole document.
AI070406.1
- Opened By: Gabor Melli.
- Description: PSID=6299 does not have v2.3 annotations (but is important because it is in the v1.3 train set).
- Example: /home/shared/PSort/PPLREdata/corpusHome/corpus6299/2_AnnotatorFiles/v2.3/AbstractDir/annotated.tab: No such file or directory
CI0700403.2
- OpenedBy: Gabor Melli.
- Version: v1.3
- Description: In PSID=1821.9 the word "cytoplasmic" is not associated to a
LOCALIZATION entity.
- Example(s): <sent=9> Although TadA is not predicted to have a transmembrane domain, the protein was localized to the inner membrane and cytoplasmic fractions of A. actinomycetemcomitans cells, indicating a possible peripheral association with the inner membrane.