EAGLi System
A Engine for question-Answering in Genomics Litterature (EAGLi) System is a Deep QA System and Search Engine for MEDLINE.
- AKA: EAGL System.
- Context:
- Example(s):
- Q: "What diseases are associated with brca1?"; EAGLi system output:
Answer | Score |
---|---|
1) Neoplasms | 88 (243 matches in 45 documents) |
2) Breast Neoplasms | 52 (95 matches in 30 documents) |
3) Ovarian Neoplasms | 22 (31 matches in 15 documents) |
4) DNA Damage | 15 (26 matches in 9 documents) |
5) Prostatic Neoplasms | 7 (11 matches in 4 documents) |
6) Triple Negative Breast Neoplasms | 6 (6 matches in 4 documents) |
7) Neoplasm Metastasis | 5 (7 matches in 3 documents) |
8) Genomic Instability | 5 (5 matches in 4 documents) |
9) Lung Neoplasms | 4 (4 matches in 2 documents) |
10) Breast Neoplasms, Male | 4 (4 matches in 3 documents) |
- Counter-Example(s):
- See: Artificial Neural Network, Natural Language Processing Task, QA Task, QA Service, Chatterbot, Turing Test.
References
2018
- (BiTEM, 2018) ⇒ http://bitem.hesge.ch/resource/eagli-eagle-eye Retrieved: 2018-12-29
2015
- (Gobeill et al., 2015) ⇒ Julien Gobeill, Arnaud Gaudinat, Emilie Pasche, Dina Vishnyakova, Pascale Gaudet, Amos Bairoch, and Patrick Ruch. (2015). “Deep Question Answering for Protein Annotation.” In: Database Journal, 2015. doi:10.1093/database/bav081
- QUOTE: EAGLi is composed of three independent components that are illustrated in Figure 3. First, given the user’s question, a question categorizer identifies the target set (i.e. the candidate set of possible answers) and the reformulated query which will be used by the Information Retrieval component. The target set is a subset of concepts belonging to a controlled vocabulary, which are likely to be answers to the user’s question. For instance, for the question 'what molecular functions are affected by Aminophenols?', the question categorizer identifies the molecular_function axis of the GO as the target set. It means that ultimately, EAGLi will propose these and only these GO terms as answers. In our study, the target set is only of two types: (i) the molecular function axis of the GO for one of the benchmark of questions and (ii) the cellular component axis of the GO for the other benchmark. The question categorizer also outputs the reformulated query needed to retrieve the relevant documents. The reformulation is needed because non-informative words need to be discarded before querying PubMed. Thus, in the previous example, the query to PubMed only contained 'Aminophenols'. The question categorizer was the default EAGLi component and remained unchanged during all the study. Then, given the reformulated query, the Information Retrieval component retrieves a set of relevant citations in MEDLINE.
Figure 3. Overall workflow of the EAGLi platform. The input is a question formulated in natural language, the output is a set of candidate answers extracted from a set of retrieved MEDLINE abstracts.
- QUOTE: EAGLi is composed of three independent components that are illustrated in Figure 3. First, given the user’s question, a question categorizer identifies the target set (i.e. the candidate set of possible answers) and the reformulated query which will be used by the Information Retrieval component. The target set is a subset of concepts belonging to a controlled vocabulary, which are likely to be answers to the user’s question. For instance, for the question 'what molecular functions are affected by Aminophenols?', the question categorizer identifies the molecular_function axis of the GO as the target set. It means that ultimately, EAGLi will propose these and only these GO terms as answers. In our study, the target set is only of two types: (i) the molecular function axis of the GO for one of the benchmark of questions and (ii) the cellular component axis of the GO for the other benchmark. The question categorizer also outputs the reformulated query needed to retrieve the relevant documents. The reformulation is needed because non-informative words need to be discarded before querying PubMed. Thus, in the previous example, the query to PubMed only contained 'Aminophenols'. The question categorizer was the default EAGLi component and remained unchanged during all the study. Then, given the reformulated query, the Information Retrieval component retrieves a set of relevant citations in MEDLINE.
2012
- (Bauer & Berleant, 2012) ⇒ Michael A. Bauer, and Daniel Berleant. (2012). “Usability Survey of Biomedical Question Answering Systems.” In: Human Genomics Journal, 6. ISBN:1479-7364 doi:10.1186/1479-7364-6-17
- QUOTE: EAGLi is quite slow and may not truly be ready for high volume traffic. In response to a question that the system ‘understands,’ a list of possible answers is displayed with corresponding levels of confidence indicated. Links to abstracts are also provided and grouped by which answers to the question they support. If a question is not understood, EAGLi returns a list of abstracts that contained some of the query terms. The program also provides a short snippet of text from the abstract that contains keywords from the query. Next to the text there are links to PubMed and to a page they call a ‘semantic summary’ which displays the entire abstract and a list of all the Gene Ontology and SwissProt terms that were matched, along with the phrase they were mapped to. A score is given to indicate to the user the strength of the mapping. This information gives the user a way to understand why the system has determined that a particular abstract supports an answer or was given as the answer. A link to a matrix is provided on the main results page that can quickly give the user an overview of the terms that were matched in the abstracts. This system provides a degree of transparency to the retrieval process that traditional information retrieval systems hide from the user. That in turn supports efforts by the user to efficiently figure out how to best phrase a query or question to get the most relevant information.
2009
- (Gobeil et al., 2009) ⇒ J. Gobeil, E. Patsche, D. Theodoro, A. -l. Veuthey, C. Lovis, and P. Ruch. (2009)."Question Answering for Biology and Medicine". In: Proceedings of 9th International Conference on Information Technology and Applications in Biomedicine.
- QUOTE: We designed and evaluated a new Question-Answering system to help finding answers to natural language questions in medical and biological digital libraries. The evaluation was performed using a benchmark generated from curated legacy databases for molecular biology (UniProt/UniMed) and medicinal chemistry (DrugBank). The system is currently able to find answers with a top precision and recall after ten answers of about 70% on both benchmarks. These preliminary results confirm that redundancy of information in literature is so high that simple retrieval and answer computation methods can achieve results competitive with more elaborated approaches based on advanced weighting schema inherited from Information Retrieval. In contrast, the use of knowledge-based resources such as terminology and ontology results in a statistically significant improvement.
Current developments include developing query-specific Information Extraction modules such as those needed to detect protein-protein interactions [1] in order to answer more complex questions such as etiological or definitional question. The Question Answering system and most of its components are freely available on the EAGLi platform at http://eagl.unige.ch/EAGLi/ (or type “EAGLi” on any web search engine) or upon request to the authors.
- QUOTE: We designed and evaluated a new Question-Answering system to help finding answers to natural language questions in medical and biological digital libraries. The evaluation was performed using a benchmark generated from curated legacy databases for molecular biology (UniProt/UniMed) and medicinal chemistry (DrugBank). The system is currently able to find answers with a top precision and recall after ten answers of about 70% on both benchmarks. These preliminary results confirm that redundancy of information in literature is so high that simple retrieval and answer computation methods can achieve results competitive with more elaborated approaches based on advanced weighting schema inherited from Information Retrieval. In contrast, the use of knowledge-based resources such as terminology and ontology results in a statistically significant improvement.
- ↑ Frederic Ehrler (2009). "Modular text mining for protein-protein interactions extraction" (Doctoral dissertation, University of Geneva).