Beam Search-based Decoding System
Jump to navigation
Jump to search
A Beam Search-based Decoding System is a decoding system that implements a beam search-based decoding algorithm.
- …
- Example(s):
- See: Beam Search, Beam Search-based System, Greedy Search Decoder.
References
2018
- https://machinelearningmastery.com/beam-search-decoder-natural-language-processing/
- QUOTE: Natural language processing tasks, such as caption generation and machine translation, involve generating sequences of words.
Models developed for these problems often operate by generating probability distributions across the vocabulary of output words and it is up to decoding algorithms to sample the probability distributions to generate the most likely sequences of words.
In this tutorial, you will discover the greedy search and beam search decoding algorithms that can be used on text generation problems.
- QUOTE: Natural language processing tasks, such as caption generation and machine translation, involve generating sequences of words.
2016
- (Xie et al., 2016) ⇒ Ziang Xie, Anand Avati, Naveen Arivazhagan, Dan Jurafsky, and Andrew Y. Ng. (2016). “Neural Language Correction with Character-Based Attention.” In: CoRR, abs/1603.09727.
- QUOTE: ... For inference we use a beam search decoder combining the neural network and the language model likelihood. Similar to Hannun et al. (2014), at step k, we rank the hypotheses on the beam using the score sk(y1:kjx) = log PNN(y1:kjx) + � log PLM(y1:k) where the hyper-parameter � determines how much the language model is weighted. To avoid penalizing longer hypotheses, we additionally normalize scores by the number of words in the hypothesis jyj. Since decoding is done at the character level, the language model probability PLM(�) is only incorporated after a space or end-of-sentence symbol is encountered. …
2014
- (Sutskever et al., 2014) ⇒ Ilya Sutskever, Oriol Vinyals, and Quoc V. Le. (2014). “Sequence to Sequence Learning with Neural Networks.” In: Advances in Neural Information Processing Systems.
- QUOTE: We search for the most likely translation using a simple left-to-right beam search decoder which maintains a small number B of partial hypotheses, where a partial hypothesis is a prefix of some translation. At each timestep we extend each partial hypothesis in the beam with every possible word in the vocabulary. This greatly increases the number of the hypotheses so we discard all but the B most likely hypotheses according to the model’s log probability. As soon as the “<EOS>” symbol is appended to a hypothesis, it is removed from the beam and is added to the set of complete hypotheses. While this decoder is approximate, it is simple to implement. Interestingly, our system performs well even with a beam size of 1, and a beam of size 2 provides most of the benefits of beam search (Table 1).