PPLRE Annotation Issues

From GM-RKB
Jump to navigation Jump to search

Open Annotation Issues


AI070301.1

  • OpenedBy: Gabor Melli.
  • Description: The word "Cytoplasmic" is sometimes not tagged to be a "LOCATION".
  • Example(s):
    • In PPLRE Passage 8611.0-1 (v2.3) the phrase "cytoplasmic regulator” is typed as a PROTEIN, but should have been labeled as a LOCATION.
    • In PSID=10931 the PROTEIN is the word "Cytoplasmic” and the location "membrane"!? "Cytoplasmic membranes were isolated and examined from two spectinomycin-susceptible and three spectinomycin-resistant clinical strains of Neisseria gonorrhoeae ."
    • In PSID=1821 the phrase cytoplasmic fractions is not tagged as a location. It is untagged. “Although TadA is not predicted to have a transmembrane domain, the protein was localized to the inner membrane and cytoplasmic fractions of A. actinomycetemcomitans cells, indicating a possible peripheral association with the inner membrane ."
  • Plan: tbd. likely to be fixed in Annotator v2.6

AI070301.2

  • OpenedBy: Gabor Melli.
  • Description: In the NER_ID column the first local id has no integer associated with it. E.g. it will say "localID_" instead of "localID_0".
  • Plan: tbd

AI070316.1

  • OpenedBy: Gabor Melli.
  • Description: Sometimes gene names are grouped into one word. Ideally we can find a way to split them up.
  • Example: 3340.a "The V. vulnificus pilA gene is part of an operon and is clustered with three other pilus biogenesis genes, pilBCD .
  • Plan: tbd

AI070316.2

  • OpenedBy: Gabor Melli.
  • Description: Sometimes location names are grouped into a phrase that occludes the named entity. Ideally we can find a way to split them up.
  • Example: PPLRE Passage 1671.a "... inner and outer membranes were ...” ⇒” … inner membranes and outer membranes were ..."
  • Plan: tbd

AI070318.1

  • Opened By: Gabor Melli.
  • Description: There are some simple synonym rules that could be implemented for PROTEINs. E.g. if "[PROTEIN p] ([PROTEIN p2])” then p1=p2.
  • Example: in PPLRE Passage 6210.a.0 "The penicillin-binding protein (PBP ) patterns of six strains of Bilophila wadsworthia were investigated by sodium dodecyl sulfate-polyacrylamide gel electrophoresis analysis and subsequent fluorography of membrane preparations labelled with [3H ] benzylpenicillin .

AI070325.1

  • Opened By: Gabor Melli.
  • Description: In PSID=3761 sentence=0 the phrase "outer membrane" is not tagged as a LOCATION named entity (in v2.3 of the annotator).

AI070325.1

  • Opened By: Gabor Melli.
  • Description: In PSID=7373 sentence=? the word "tcpQ" is coreferenced with LocalID_1, except for one instance (token 186 sentence 7). (in v2.3 of the annotator).

AI070326.1

  • Opened By: Gabor Melli.
  • Description: PSID=6299 has no annotated.tab file. (in either v2.3 or v2.4 of the annotator). There are sentences in the sentences.txt file.

AI070326.2

  • Opened By: Gabor Melli.
  • Description: PSID=4990 has many errors in coreference resolution. The word "protainase" is assigned to many "LocalID_X"s. Note that one of the entries, the one with a capitalized "P" is a Dictionary match. May want to experiment with uncased matching.

AI070401.1

  • Opened By: Gabor Melli.
  • Description: PSID=7373 One of the LOCALIZATION entries has an ExtID with a Ctrl-M character.
  • Suggestion: Update Annotator to remove leading & trailing spaces.

AI070403.1

  • Opened By: Gabor Melli.
  • Description: PSID=10331 sent=0 a critical protein is not tagged. The reason may be that it is the first word on the abstract: "FrpB".
  • Suggestion: Because the word is tagged as a protein elsewhere in the document (twice if you count the gene frpB), it may possible to update the ConceptClassifier to make decisions based on the whole document.

AI070406.1

  • Opened By: Gabor Melli.
  • Description: PSID=6299 does not have v2.3 annotations (but is important because it is in the v1.3 train set).
  • Example: /home/shared/PSort/PPLREdata/corpusHome/corpus6299/2_AnnotatorFiles/v2.3/AbstractDir/annotated.tab: No such file or directory

CI0700403.2

  • OpenedBy: Gabor Melli.
  • Version: v1.3
  • Description: In PSID=1821.9 the word "cytoplasmic" is not associated to a

LOCALIZATION entity.

  • Example(s): <sent=9> Although TadA is not predicted to have a transmembrane domain, the protein was localized to the inner membrane and cytoplasmic fractions of A. actinomycetemcomitans cells, indicating a possible peripheral association with the inner membrane.

Closed Annotation Issues