Bidirectional LSTM/CRF (BiLTSM-CRF) Training System

A Bidirectional LSTM/CRF (BiLTSM-CRF) Training System is a bidirectional LSTM training system that includes a CRF training system and implements a bi-directional LSTM/CRF training algorithm to train a biLSTM-CRF model.

Context:
- It can (typically) include a Bidirectional LSTM Training System.
- It can (typically) include a CRF Training System.
- It can range from being a Shallow Bidirectional LSTM Training System to being a Deep Bidirectional LSTM Training System.
- …
Example(s):
- LopezGG/NN_NER_tensorFlow [1].
- D3NER [2].
- a Bidirectional LSTM-CNN-CRF Training System such as BiLSTM-CNN-CRF (Reimers & Gurevych, 2017) Training System [3]:
- an Att-BiLSTM-CRF Training System [4]:
  - Training a BiLSTM-CRF model using a training set (trainfile), development set (devfile), testing set (testfile) and word embedding model (word_embedding.model):
    python train.py --train trainfile --dev devfile --test testfile --pre_emb word_embedding.model
  - Tagging documents using a pretrained BiLSTM-CRF model, a tokenized input data (inputfile) that must include one document by line:
    python tagger.py --model BiLSTM-CRF.model --input inputfile --output outputfile.
    Output Data are saved in the outputfile.
- a Kaniblu Pytorch-BiLSTM-CRF (Kang Min Yoo, 2017) [5] :
  - Usage: python train.py --input-path sentences.txt --input-path pos.txt --label-path labels.txt
- Pytorch Advance Tutorial. Bi-LSTM CRF Robert Guthrie Implementation Notes: [6]:

import torch          import torch.autograd as autograd
          import torch.nn as nn         
        import torch.optim as optim
          def argmax(vec):
(...)
          class BiLSTM_CRF(nn.Module):
(...)
          model = BiLSTM_CRF(len(word_to_ix), tag_to_ix, EMBEDDING_DIM, HIDDEN_DIM)
          optimizer = optim.SGD(model.parameters(), lr=0.01, weight_decay=1e-4)
(...)

Counter-Example(s):
- an Unidirectional LSTM Training System, such as a Unidirectional LSTM-based Language Modeling System, or a Deep Unidirectional LSTM Training System.
- a seq2seq-based Neural Modeling System.
See: LSTM System, neuroner.com, Conditional Random Field, Bidirectional Recurrent Neural Network, Dynamic Neural Network.

References

2018a

(Reimers & Gurevych, 2018) ⇒ BiLSTM-CNN-CRF repository: https://github.com/UKPLab/emnlp2017-bilstm-cnn-crf Retrieved: 2018-06-10.
- QUOTE: This repository contains a BiLSTM-CRF implementation that used for NLP Sequence Tagging (for example POS-tagging, Chunking, or Named Entity Recognition). The implementation is based on Keras 2.1.5 and can be run with Tensorflow 1.7.0 as backend. It was optimized for Python 3.5 /3.6. It does not work with Python 2.7.
  The architecture is described in our papers:
  *** Reporting Score Distributions Makes a Difference: Performance Study of LSTM-networks for Sequence Tagging
  - Why Comparing Single Performance Scores Does Not Allow to Draw Conclusions About Machine Learning Approaches
  - Optimal Hyperparameters for Deep LSTM-Networks for Sequence Labeling Tasks.

The implementation is highly configurable, so you can tune the different hyperparameters easily. You can use it for Single Task Learning as well as different options for Multi-Task Learning. You can also use it for Multilingual Learning by using multilingual word embeddings. …

2018b

(Yoo, 2018) ⇒ BiLSTM-CRF on PyTorch repository: https://github.com/kaniblu/pytorch-bilstmcrf Retrieved 2018-06-10
- QUOTE: An efficient BiLSTM-CRF implementation that leverages mini-batch operations on multiple GPUs (...)
  Training
  Prepare data first. Data must be supplied with separate text files for each input or target label type. Each line contains a single sequence, and each pair of tokens are separated with a space. For example, for the task of Named Entity Recognition using words and Part-of-Speech tags, the input and label files might be prepared as follows:

(sents.txt)                   the fat rat sat on a mat
...         
(post.txt)
          det adj noun verb prep det noun
...         
(labels.txt)
          O O B-Animal O O O B-Object
...         
        

Then above input and label files are provided to train.py using --input-path and --label-path respectively.

python train.py --input-path sents.txt --input-path pos.txt --label-path labels.txt

You might need to setup several more parameters in order to make it work. Checkout examples/atis for an example of training a simple BiLSTM-CRF model with ATIS dataset. Run python preprocess.py at the example directory to convert to the dataset to train.py-friendly format, then run

python ../../train.py --config train-atis.yml

to see a running example. The example configuration assumes that standalone tensorboard is installed (you could turn it off in the configuration file).

For more information on the configurations, check out python train.py --help.

2017a

(PyTorch, 2018) ⇒ (2017). "Bi-LSTM Conditional Random Field Discussion" . In: "Advanced: Making Dynamic Decisions and the Bi-LSTM CRF."
- QUOTE: For this section, we will see a full, complicated example of a Bi-LSTM Conditional Random Field for named-entity recognition. The LSTM tagger above is typically sufficient for part-of-speech tagging, but a sequence model like the CRF is really essential for strong performance on NER. Familiarity with CRF’s is assumed. Although this name sounds scary, all the model is is a CRF but where an LSTM provides the features. This is an advanced model though, far more complicated than any earlier model in this tutorial. If you want to skip it, that is fine. To see if you’re ready, see if you can:
  - Write the recurrencefor the viterbi variable at step i for tag k.
  - Modify the above recurrence to compute the forward variables instead.
  - Modify again the above recurrence to compute the forward variables in log-space (hint: log-sum-exp)

2017b

(Github, 2017) ⇒ LSTM-CRF in PyTorch: https://github.com/threelittlemonkeys/lstm-crf-pytorch
- QUOTE: A PyTorch implementation of bidirectional LSTM-CRF for sequence tagging, adapted from the PyTorch tutorial.
  Supported features:
  - Mini-batch training with CUDA.
  - Vectorized computation of CRF loss

2017c

(Reimers & Gurevych, 2017a) ⇒ Nils Reimers, and Iryna Gurevych. (2017). “Reporting Score Distributions Makes a Difference: Performance Study of LSTM-networks for Sequence Tagging.” In: Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pp. 338-348.
- QUOTE: … We use a BiLSTM-network for sequence tagging as described in (Huang et al., 2015; Ma and Hovy, 2016; Lample et al., 2016). To be able to evaluate a large number of different network configurations, we optimized our implementation for efficiency, reducing by a factor of 6 the time required per epoch compared to Ma and Hovy (2016). …