CBOW NNLM Algorithm
(Redirected from CBOW)
Jump to navigation
Jump to search
A CBOW NNLM Algorithm is a NNLM algorithm that predicts the target word based on the other words in the context window as network inputs.
- Context:
- It can (typically) share the weights of different positions.
- It can (typically) have a Linear Hidden Layer (and therefore not shown in the nnet architecture).
- It can be a CBOW NNLM with Negative Sampling Algorithm.
- It can include Vocabulary Downsampling.
- …
- Example(s):
- Counter-Example(s):
- See: word2vec System.
References
2013
- https://code.google.com/p/word2vec/
- This tool provides an efficient implementation of the continuous bag-of-words and skip-gram architectures for computing vector representations of words. These representations can be subsequently used in many natural language processing applications and for further research. <
- (Mikolov et al., 2013a) ⇒ Tomáš Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. (2013). “Efficient Estimation of Word Representations in Vector Space.” In: Proceedings of International Conference of Learning Representations Workshop.
- QUOTE: The first proposed architecture is similar to the feedforward NNLM, where the non-linear hidden layer is removed and the projection layer is shared for all words (not just the projection matrix); thus, all words get projected into the same position (their vectors are averaged). We call this architecture a bag-of-words model as the order of words in the history does not influence the projection. Furthermore, we also use words from the future; we have obtained the best performance on the task introduced in the next section by building a log-linear classifier with four future and four history words at the input, where the training criterion is to correctly classify the current (middle) word. Training complexity is then: [math]\displaystyle{ Q = N \times D + D \times \log_2 (V). \lt /s\gt \(4) }[/math] We denote this model further as CBOW, as unlike standard bag-of-words model, it uses continuous distributed representation of the context. The model architecture is shown at Figure 1. Note that the weight matrix between the input and the projection layer is shared for all word positions in the same way as in the NNLM.