Att-BiLSTM-CRF Training System

From GM-RKB

Jump to navigation Jump to search

A Att-BiLSTM-CRF Training System is a BiLSTM-CRF Training System based on Attention Mechanism and that implements a Att-BiLSTM-CRF Training Algorithm.

AKA: Attention-based BiLSTM-CRF Training System.
Example(s):
- An Att-BiLSTM-CRF Training System [1] using a training set (trainfile), development set (devfile), testing set (testfile) and word embedding model (word_embedding.model):
  python AttenTrain.py --train trainfile --dev devfile --test testfile --pre_emb word_embedding.model
- Application: Document Tagging System using a pretrained Att-BiLSTM-CRF model, a tokenized input data (inputfile, it must include one document by line):
  python Atten_tagger.py --model Att-BiLSTM-CRF.model --input inputfile --output outputfile.
  Output Data are saved in the outputfile.
- …
Counter-Example(s):
See: LSTM System, neuroner.com, Conditional Random Field, Bidirectional Recurrent Neural Network.

References

2018

(Ling et al., 2018) ⇒ Ling Luo, Zhihao Yang, Pei Yang, Yin Zhang, Lei Wang, Hongfei Lin, and Jian Wang. (2018). “An attention-based BiLSTM-CRF approach to document-level chemical named entity recognition.” In: Bioinformatics, 34(8).
- ABSTRACT:
  Motivation. In biomedical research, chemical is an important class of entities, and chemical named entity recognition (NER) is an important task in the field of biomedical information extraction. However, most popular chemical NER methods are based on traditional machine learning and their performances are heavily dependent on the feature engineering. Moreover, these methods are sentence-level ones which have the tagging inconsistency problem.
  Results. In this paper, we propose a neural network approach, i.e. attention-based bidirectional Long Short-Term Memory with a conditional random field layer (Att-BiLSTM-CRF), to document-level chemical NER. The approach leverages document-level global information obtained by attention mechanism to enforce tagging consistency across multiple instances of the same token in a document. It achieves better performances with little feature engineering than other state-of-the-art methods on the BioCreative IV chemical compound and drug name recognition (CHEMDNER) corpus and the BioCreative V chemical-disease relation (CDR) task corpus (the F-scores of 91.14 and 92.57%, respectively).
  Availability and implementation. Data and code are available at https://github.com/lingluodlut/Att-ChemdNER.

Retrieved from "http://www.gabormelli.com/RKB/index.php?title=Att-BiLSTM-CRF_Training_System&oldid=869695"