PPLRE Annotator
A PPLRE Annotator is the Annotation System used in the PPLRE Project to put some explicit Syntactic Structure and Shallow Semantic Structure to the PPLRE Corpus.
- Context:
- Annotation includes End-of-Sentence Detection, Tokenization, Part-of-Speech Tagging, Word Sense Disambiguation, Syntactic Parsing, Named Entity Recognition, and Semantic Role Labeling.
- Current version is v2.5
- See: PPLRE Detailed Application Design, PPLRE Annotation Issues.
Current Version
The Current version of the annotator is located at /home/shared/PSort/PPLRE/bin/Annotator/v2.5.1
Sample Input/Output
The following sample refers to the PubMed abstract with PSID=7181.
Sample Input
Here is the relevant portion from the source file.
cat /home/shared/PSort/PPLRE/data/corpusHome/corpus/7181/2_AnnotatorFiles/v2.3/sourcetext.txt <ABSTRACT>
Virulent mycobacteria utilize surface-exposed polyketides to interact with host cells, but the mechanism by which these hydrophobic molecules are transported across the cell envelope to the surface of the bacteria is poorly understood. Phthiocerol dimycocerosate (PDIM), a surface-exposed polyketide lipid necessary for <italic>Mycobacterium tuberculosis</italic> virulence, is the product of several polyketide synthases including PpsE. Transport of PDIM requires MmpL7, a member of the MmpL family of RND permeases. Here we show that a domain of MmpL7 biochemically interacts with PpsE, the first report of an interaction between a biosynthetic enzyme and its cognate transporter. Overexpression of the interaction domain of MmpL7 acts as a dominant negative to PDIM synthesis by poisoning the interaction between synthase and transporter. This suggests that MmpL7 acts in complex with the synthesis machinery to efficiently transport PDIM across the cell membrane. Coordination of synthesis and transport may not only be a feature of MmpL-mediated transport in <italic>M. tuberculosis,</italic> but may also represent a general mechanism of polyketide export in many different microorganisms.
</ABSTRACT>
Sample Output
Below are two of the resulting output files generated by the PPLRE Annotator that are most typically referenced by subsequent processes: sentences.txt and annotated.tab.
sentences.txt file
Drawn from PPLRE Corpus 7181.a
Virulent mycobacteria utilize surface-exposed polyketides to interact with host cells, but the mechanism by which these hydrophobic molecules are transported across the cell envelope to the surface of the bacteria is poorly understood.
Phthiocerol dimycocerosate (PDIM), a surface-exposed polyketide lipid necessary for Mycobacterium tuberculosis virulence, is the product of several polyketide synthases including PpsE.
Transport of PDIM requires MmpL7, a member of the MmpL family of RND permeases.
Here we show that a domain of MmpL7 biochemically interacts with PpsE, the first report of an interaction between a biosynthetic enzyme and its cognate transporter.
annotated.tab file
Drawn from PPLRE Corpus 7181.a
RowID |
Token |
Stemmed |
POS |
Parse Tree |
Concept Type |
Concept ID |
Predicate |
SRL-1 |
SRL-2 |
SRL-3 |
SRL-4 |
|
1 | Virulent | Virulent | JJ | (S1(S(S(NP* | (mono_cell | (0 | (2 | * | (A0* | * | * | * |
2 | mycobacteria | mycobacteria | NNS | *) | ) | ) | ) | * | *) | * | * | * |
3 | utilize | utilize | VB | (VP* | Functional_Concept | C0042153 | 4 | utilize | (V*) | * | * | * |
4 | surface-exposed | surface-exposed | JJ | (NP* | Spatial_Concept | C0205148 | 4 | * | (A1* | * | * | * |
5 | polyketides | polyketides | NNS | *) | Clinical_Attribute | C0332157 | 4 | * | *) | * | * | * |
6 | to | to | TO | (S(VP* | - | - | - | * | (AM-PNC* | * | * | * |
7 | interact | interact | VB | (VP* | Molecular_Function | C0687133 | 4 | interact | * | (V*) | * | * |
8 | with | with | IN | (PP* | - | - | - | * | * | (A2* | * | * |
9 | host | host | NN | (NP* | (cell_type | (1 | (2 | * | * | * | * | * |
10 | cells | cell | NNS | *))))))) | ) | ) | ) | * | *) | *) | * | * |
11 | , | , | , | * | - | - | - | * | * | * | * | * |
12 | but | but | CC | * | - | - | - | * | * | * | * | * |
13 | the | the | DT | (S(NP(NP* | - | - | - | * | * | * | (AM-MNR* | (A1* |
14 | mechanism | mechanism | NN | *) | Functional_Concept | C0441712 | 4 | * | * | * | *) | * |
15 | by | by | IN | (SBAR(WHPP* | - | - | - | * | * | * | * | * |
16 | which | which | WDT | (WHNP*)) | - | - | - | * | * | * | * | * |
17 | these | these | DT | (S(NP* | - | - | - | * | * | * | (A1* | * |
18 | hydrophobic | hydrophobic | JJ | * | (peptide | (2 | (2 | * | * | * | * | * |
19 | molecules | molecule | NNS | *) | ) | ) | ) | * | * | * | *) | * |
20 | are | are | AUX | (VP* | - | - | - | * | * | * | * | * |
21 | transported | transport | VBN | (VP* | Cell_Function | C0005528 | 4 | transported | * | * | (V*) | * |
22 | across | across | IN | (PP* | - | - | - | * | * | * | * | * |
23 | the | the | DT | (NP* | - | - | - | * | * | * | * | * |
24 | cell | cell | NN | * | (LOCALIZATION | (GO0005618 | (3 | * | * | * | * | * |
25 | envelope | envelope | NN | *)) | ) | ) | ) | * | * | * | * | * |
26 | to | to | TO | (PP* | - | - | - | * | * | * | * | * |
27 | the | the | DT | (NP(NP* | - | - | - | * | * | * | * | * |
28 | surface | surface | NN | *) | Spatial_Concept | C0205148 | 4 | * | * | * | * | * |
29 | of | of | IN | (PP* | - | - | - | * | * | * | * | * |
30 | the | the | DT | (NP* | - | - | - | * | * | * | * | * |
31 | bacteria | bacteria | NNS | *))))))))) | mono_cell | 4 | 2 | * | * | * | * | *) |
32 | is | is | AUX | (VP* | - | - | - | * | * | * | * | * |
33 | poorly | poorly | RB | (ADVP*) | Qualitative_Concept | C0205169 | 4 | * | * | * | * | (AM-MNR*) |
34 | understood | understand | VBN | (VP*))) | Mental_Process | C0162340 | 4 | understood | * | * | * | (V*) |
35 | . | . | . | *)) | - | - | - | * | * | |||
36 | ||||||||||||
37 | Phthiocerol | Phthiocerol | NNP | (S1(S(NP(NP* | (Lipid | (C0070976 | (4 | * | * | |||
38 | dimycocerosate | dimycocerosate | NNP | *) | ) | ) | ) | * | * | |||
39 | ( | ( | -LRB- | (PRN* | - | - | - | * | * | |||
40 | PDIM | PDIM | NNP | (NP*) | other_organic_compound | 6 | 2 | * | * | |||
41 | ) | ) | -RRB- | *) | - | - | - | * | * | |||
42 | , | , | , | * | - | - | - | * | * | |||
43 | a | a | DT | (NP(NP* | - | - | - | * | * | |||
44 | surface-exposed | surface-exposed | JJ | * | lipid | 7 | 2 | * | * | |||
45 | polyketide | polyketide | JJ | * | - | - | - | * | * | |||
46 | lipid | lipid | NN | *) | 14742191 | 1 | 5 | * | * | |||
47 | necessary | necessary | JJ | (ADJP* | Lipid | C0023779 | 4 | * | * | |||
48 | for | for | IN | (PP* | 639871 | 3 | * | * | ||||
49 | Mycobacterium | Mycobacterium | NNP | (NP* | (ORGANISM | (1773 | (3 | * | * | |||
50 | tuberculosis | tuberculosis | FW | * | ) | ) | ) | * | * | |||
51 | virulence | virulence | NN | *)))) | Biologic_Function | C0042765 | 4 | * | * | |||
52 | , | , | , | *) | - | - | - | * | * | |||
53 | is | is | AUX | (VP* | - | - | - | * | * | |||
54 | the | the | DT | (NP(NP* | - | - | - | * | * | |||
55 | product | product | NN | *) | 3707459 | 1 | 5 | * | * | |||
56 | of | of | IN | (PP* | - | - | - | * | * | |||
57 | several | several | JJ | (NP(NP* | Quantitative_Concept | C0443302 | 4 | * | (A2* | |||
58 | polyketide | polyketide | JJ | * | (PROTEIN | (localID_0 | (1 | * | * | |||
59 | synthases | synthases | NNS | *) | ) | ) | ) | * | *) | |||
60 | including | include | VBG | (PP* | Functional_Concept | C0332257 | 4 | including | (V*) | |||
61 | PpsE | PpsE | NNP | (NP*)))))) | PROTEIN | localID_1 | 1 | * | (A1*) | |||
62 | . | . | . | *)) | - | - | - | * | * | |||
63 | ||||||||||||
64 | Transport | Transport | NNP | (S1(S(NP(NP*) | Cell_Function | C0005528 | 4 | * | (A1* | |||
65 | of | of | IN | (PP* | - | - | - | * | * | |||
66 | PDIM | PDIM | NNP | (NP*))) | PROTEIN | localID_2 | 1 | * | *) | |||
67 | requires | require | VBZ | (VP* | 2602586 | 1 | 5 | requires | (V*) | |||
68 | MmpL7 | MmpL7 | NNP | (NP(NP*) | PROTEIN | localID_3 | 1 | * | (A0* | |||
69 | , | , | , | * | - | - | - | * | * | |||
70 | a | a | DT | (NP(NP* | - | - | - | * | * | |||
71 | member | member | NN | *) | Population_Group | C0680022 | 4 | * | * | |||
72 | of | of | IN | (PP* | - | - | - | * | * | |||
73 | the | the | DT | (NP(NP* | - | - | - | * | * | |||
74 | MmpL | MmpL | NNP | * | (PROTEIN | (localID_4 | (1 | * | * | |||
75 | family | family | NN | *) | ) | ) | ) | * | * | |||
76 | of | of | IN | (PP* | - | - | - | * | * | |||
77 | RND | RND | JJ | (NP* | (PROTEIN | (localID_5 | (1 | * | * | |||
78 | permeases | permeases | NNS | *))))))) | ) | ) | ) | * | *) | |||
79 | . | . | . | *)) | - | - | - | * | * | * | ||
80 | ||||||||||||
81 | Here | Here | RB | (S1(S(ADVP*) | 109485 | 1 | 5 | * | (AM-LOC*) | * | ||
82 | we | we | PRP | (NP*) | - | - | - | * | (A0*) | * | ||
83 | show | show | VBP | (VP* | 656725 | 2 | 5 | show | (V*) | * | ||
84 | that | that | IN | (SBAR* | - | - | - | * | (A1* | * | ||
85 | a | a | DT | (S(NP(NP* | - | - | - | * | * | (A1* | ||
86 | domain | domain | NN | *) | 8437765 | 2 | 5 | * | * | * | ||
87 | of | of | IN | (PP* | - | - | - | * | * | * | ||
88 | MmpL7 | MmpL7 | NNP | (NP*))) | PROTEIN | localID_3 | 1 | * | * | *) | ||
89 | biochemically | biochemically | RB | (ADVP*) | Functional_Concept | C0205474 | 4 | * | * | (AM-ADV*) | ||
90 | interacts | interact | VBZ | (VP* | Molecular_Function | C0687133 | 4 | interacts | * | (V*) | ||
91 | with | with | IN | (PP* | - | - | - | * | * | (A2* | ||
92 | PpsE | PpsE | NNP | (NP(NP*) | PROTEIN | localID_1 | 1 | * | * | * | ||
93 | , | , | , | * | - | - | - | * | * | * | ||
94 | the | the | DT | (NP(NP* | - | - | - | * | * | * | ||
95 | first | first | JJ | * | Quantitative_Concept | C0205435 | 4 | * | * | * | ||
96 | report | report | NN | *) | Intellectual_Product | C0684224 | 4 | * | * | * | ||
97 | of | of | IN | (PP* | - | - | - | * | * | * | ||
98 | an | an | DT | (NP(NP* | - | - | - | * | * | * | ||
99 | interaction | interaction | NN | *) | Molecular_Function | C0687133 | 4 | * | * | * | ||
100 | between | between | IN | (PP* | - | - | - | * | * | * | ||
101 | a | a | DT | (NP(NP* | - | - | - | * | * | * | ||
102 | biosynthetic | biosynthetic | JJ | * | - | - | - | * | * | * | ||
103 | enzyme | enzyme | NN | *) | PROTEIN | 638536 | 3 | * | * | * | ||
104 | and | and | CC | * | - | - | - | * | * | * | ||
105 | its | its | PRP$ | (NP* | - | - | - | * | * | * | ||
106 | cognate | cognate | JJ | * | 2042649 | 1 | 5 | * | * | * | ||
107 | transporter | transporter | NN | *)))))))))))) | PROTEIN | localID_6 | 1 | * | *) | *) | ||
108 | . | . | . | *)) | - | - | - | * | * | * | ||
109 |