Bidirectional LSTM/CRF (BiLTSM-CRF) Training System
A Bidirectional LSTM/CRF (BiLTSM-CRF) Training System is a bidirectional LSTM training system that includes a CRF training system and implements a bi-directional LSTM/CRF training algorithm to train a biLSTM-CRF model.
- Context:
- It can (typically) include a Bidirectional LSTM Training System.
- It can (typically) include a CRF Training System.
- It can range from being a Shallow Bidirectional LSTM Training System to being a Deep Bidirectional LSTM Training System.
- …
- Example(s):
- LopezGG/NN_NER_tensorFlow [1].
- D3NER [2].
- a Bidirectional LSTM-CNN-CRF Training System such as BiLSTM-CNN-CRF (Reimers & Gurevych, 2017) Training System [3]:
- BiLSTM-CNN-CRF Training System for NER in German using the GermEval 2014 dataset,
- BiLSTM-CNN-CRF network training for part-of-speech tagging using the universal dependency dataset,
- BiLSTM-CNN-CRF network training for Chunking in English using the CoNLL 2000 dataset,
- Multi-task learning using a BiLSTM-CNN-CRF implementation.
- an Att-BiLSTM-CRF Training System [4]:
- Training a BiLSTM-CRF model using a training set (
trainfile
), development set (devfile
), testing set (testfile
) and word embedding model (word_embedding.model
):python train.py --train trainfile --dev devfile --test testfile --pre_emb word_embedding.model
- Tagging documents using a pretrained BiLSTM-CRF model, a tokenized input data (
inputfile
) that must include one document by line:python tagger.py --model BiLSTM-CRF.model --input inputfile --output outputfile
. Output Data are saved in the
outputfile
.
- Training a BiLSTM-CRF model using a training set (
- a Kaniblu Pytorch-BiLSTM-CRF (Kang Min Yoo, 2017) [5] :
- Usage:
python train.py --input-path sentences.txt --input-path pos.txt --label-path labels.txt
- Usage:
- Pytorch Advance Tutorial. Bi-LSTM CRF Robert Guthrie Implementation Notes: [6]:
import torch.autograd as autograd
import torch.nn as nn
import torch.optim as optim
def argmax(vec):
(...)
class BiLSTM_CRF(nn.Module):
(...)
model = BiLSTM_CRF(len(word_to_ix), tag_to_ix, EMBEDDING_DIM, HIDDEN_DIM)
optimizer = optim.SGD(model.parameters(), lr=0.01, weight_decay=1e-4)
(...)
- Counter-Example(s):
- See: LSTM System, neuroner.com, Conditional Random Field, Bidirectional Recurrent Neural Network, Dynamic Neural Network.
References
2018a
- (Reimers & Gurevych, 2018) ⇒ BiLSTM-CNN-CRF repository: https://github.com/UKPLab/emnlp2017-bilstm-cnn-crf Retrieved: 2018-06-10.
- QUOTE: This repository contains a BiLSTM-CRF implementation that used for NLP Sequence Tagging (for example POS-tagging, Chunking, or Named Entity Recognition). The implementation is based on Keras 2.1.5 and can be run with Tensorflow 1.7.0 as backend. It was optimized for Python 3.5 /3.6. It does not work with Python 2.7.
The architecture is described in our papers:
- QUOTE: This repository contains a BiLSTM-CRF implementation that used for NLP Sequence Tagging (for example POS-tagging, Chunking, or Named Entity Recognition). The implementation is based on Keras 2.1.5 and can be run with Tensorflow 1.7.0 as backend. It was optimized for Python 3.5 /3.6. It does not work with Python 2.7.
- The implementation is highly configurable, so you can tune the different hyperparameters easily. You can use it for Single Task Learning as well as different options for Multi-Task Learning. You can also use it for Multilingual Learning by using multilingual word embeddings. …
2018b
- (Yoo, 2018) ⇒ BiLSTM-CRF on PyTorch repository: https://github.com/kaniblu/pytorch-bilstmcrf Retrieved 2018-06-10
- QUOTE: An efficient BiLSTM-CRF implementation that leverages mini-batch operations on multiple GPUs (...)
Training
Prepare data first. Data must be supplied with separate text files for each input or target label type. Each line contains a single sequence, and each pair of tokens are separated with a space. For example, for the task of Named Entity Recognition using words and Part-of-Speech tags, the input and label files might be prepared as follows:
- QUOTE: An efficient BiLSTM-CRF implementation that leverages mini-batch operations on multiple GPUs (...)
the fat rat sat on a mat
...
(post.txt)
det adj noun verb prep det noun
...
(labels.txt)
O O B-Animal O O O B-Object
...
- Then above input and label files are provided to
train.py
using--input-path
and--label-path
respectively.python train.py --input-path sents.txt --input-path pos.txt --label-path labels.txt
You might need to setup several more parameters in order to make it work. Checkout
examples/atis
for an example of training a simple BiLSTM-CRF model with ATIS dataset. Runpython preprocess.py
at the example directory to convert to the dataset totrain.py
-friendly format, then runpython ../../train.py --config train-atis.yml
to see a running example. The example configuration assumes that standalone tensorboard is installed (you could turn it off in the configuration file).
For more information on the configurations, check out
python train.py --help
.
- Then above input and label files are provided to
2017a
- (PyTorch, 2018) ⇒ (2017). "Bi-LSTM Conditional Random Field Discussion" . In: "Advanced: Making Dynamic Decisions and the Bi-LSTM CRF."
- QUOTE: For this section, we will see a full, complicated example of a Bi-LSTM Conditional Random Field for named-entity recognition. The LSTM tagger above is typically sufficient for part-of-speech tagging, but a sequence model like the CRF is really essential for strong performance on NER. Familiarity with CRF’s is assumed. Although this name sounds scary, all the model is is a CRF but where an LSTM provides the features. This is an advanced model though, far more complicated than any earlier model in this tutorial. If you want to skip it, that is fine. To see if you’re ready, see if you can:
- Write the recurrencefor the viterbi variable at step i for tag k.
- Modify the above recurrence to compute the forward variables instead.
- Modify again the above recurrence to compute the forward variables in log-space (hint: log-sum-exp)
- QUOTE: For this section, we will see a full, complicated example of a Bi-LSTM Conditional Random Field for named-entity recognition. The LSTM tagger above is typically sufficient for part-of-speech tagging, but a sequence model like the CRF is really essential for strong performance on NER. Familiarity with CRF’s is assumed. Although this name sounds scary, all the model is is a CRF but where an LSTM provides the features. This is an advanced model though, far more complicated than any earlier model in this tutorial. If you want to skip it, that is fine. To see if you’re ready, see if you can:
2017b
- (Github, 2017) ⇒ LSTM-CRF in PyTorch: https://github.com/threelittlemonkeys/lstm-crf-pytorch
- QUOTE: A PyTorch implementation of bidirectional LSTM-CRF for sequence tagging, adapted from the PyTorch tutorial.
Supported features:
- Mini-batch training with CUDA.
- Vectorized computation of CRF loss
- QUOTE: A PyTorch implementation of bidirectional LSTM-CRF for sequence tagging, adapted from the PyTorch tutorial.
2017c
- (Reimers & Gurevych, 2017a) ⇒ Nils Reimers, and Iryna Gurevych. (2017). “Reporting Score Distributions Makes a Difference: Performance Study of LSTM-networks for Sequence Tagging.” In: Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pp. 338-348.
- QUOTE: … We use a BiLSTM-network for sequence tagging as described in (Huang et al., 2015; Ma and Hovy, 2016; Lample et al., 2016). To be able to evaluate a large number of different network configurations, we optimized our implementation for efficiency, reducing by a factor of 6 the time required per epoch compared to Ma and Hovy (2016). …