word2vec-like System

From GM-RKB
(Redirected from word2vec toolkit)
Jump to navigation Jump to search

A word2vec-like System is a distributional word embedding training system that applies a word2vec algorithm (based on work by Tomáš Mikolov, Kai Chen, Greg Corrado, Jeffrey Dean, et al[1]).



References

2021

2016

Using such an encoding, there’s no meaningful comparison we can make between word vectors other than equality testing.

In word2vec, a distributed representation of a word is used. Take a vector with several hundred dimensions (say 1000). Each word is representated by a distribution of weights across those elements. So instead of a one-to-one mapping between an element in the vector and a word, the representation of a word is spread across all of the elements in the vector, and each element in the vector contributes to the definition of many words.

If I label the dimensions in a hypothetical word vector (there are no such pre-assigned labels in the algorithm of course), it might look a bit like this:

 :: Such a vector comes to represent in some abstract way the ‘meaning’ of a word. And as we’ll see next, simply by examining a large corpus it’s possible to learn word vectors that are able to capture the relationships between words in a surprisingly expressive way. We can also use the vectors as inputs to a neural network.

2015

2014

2014

2013

2013b

2013a