Negative-Sampling Algorithm
Jump to navigation
Jump to search
A Negative-Sampling Algorithm is a sampling algorithm that removes dominant classes.
- Context:
- Instead of propagating the signal from the hidden layer to the whole output layer, only the output neuron that represents the positive class plus a few randomly sampled neurons are evaluated.
- The output neurons are treated as independent logistic regression classifiers.
- It can make the training speed independent of the vocabulary size.
- See: Imbalanced Dataset, CBOW NNLM Algorithm, Skip-Gram NNLM Algorithm.
References
2014
- (Baroni et al., 2014) ⇒ Marco Baroni, Georgiana Dinu, and Germán Kruszewski. (2014). “Don't Count, Predict! a Systematic Comparison of Context-counting Vs. Context-predicting Semantic Vectors.” In: Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (ACL 2014)
- QUOTE: Hierarchical softmax is a computationally efficient way to estimate the overall probability distribution using an output layer that is proportional to log (unigram.perplexity (W)) instead of W (for W the vocabulary size). As an alternative, negative sampling estimates the probability of an output word by learning to distinguish it from draws from a noise distribution. The number of these draws (number of negative samples) is given by a parameter k. We test both hierarchical softmax and negative sampling with k values of 5 and 10. Very frequent words such as the or a are not very informative as context features.
2013
- (Goldberg & Levy, 2014) ⇒ Yoav Goldberg, and Omer Levy. (2014). “word2vec Explained: Deriving Mikolov Et Al.'s Negative-sampling Word-embedding Method.” In: arXiv preprint arXiv:1402.3722.
- QUOTE: Mikolov et al. present the negative-sampling approach as a more efficient way of deriving word embeddings. While negative-sampling is based on the skip-gram model, it is in fact optimizing a different objective. What follows is the derivation of the negative-sampling objective.