Boltzmann Machine (BM)
A Boltzmann Machine (BM) is a stochastic recurrent neural network based on the Boltzmann Distribution.
- Context:
- It can be trained by a Boltzmann Machine Training System that implements a Boltzmann Machine Training Algorithm.
- It can be represented as a Log-linear Markov Random Field.
- It is usually assumed that the energy function is linear in its free parameters.
- Example(s):
- the algorithm propose in (Ackley et al., 1985).
- a Restricted Boltzmann Machine.
- …
- Counter-Example(s):
- See: Recurrent Neural Network, Generative Model, Hebbian, Partition Function.
References
2017a
- (Hinton, 2017) ⇒ Hinton G. (2017). "Boltzmann Machines". In: Sammut, C., Webb, G.I. (eds) Encyclopedia of Machine Learning and Data Mining. Springer, Boston, MA
- QUOTE: A Boltzmann machine is a network of symmetrically connected, neuron-like units that make stochastic decisions about whether to be on or off. Boltzmann machines have a simple learning algorithm (Hinton and Sejnowski 1983 [1]) that allows them to discover interesting features that represent complex regularities in the training data. The learning algorithm is very slow in networks with many layers of feature detectors, but it is fast in “restricted Boltzmann machines” that have a single layer of feature detectors. Many hidden layers can be learned efficiently by composing restricted Boltzmann machines, using the feature activations of one as the training data for the next.
Boltzmann machines are used to solve two quite different computational problems. For a search problem, the weights on the connections are fixed and are used to represent a cost function. The stochastic dynamics of a Boltzmann machine then allow it to sample binary state vectors that have low values of the cost function. For a learning problem, the Boltzmann machine is shown a set of binary data vectors, and it must learn to generate these vectors with high probability. To do this, it must find weights on the connections so that relative to other possible binary vectors, the data vectors have low values of the cost function. To solve a learning problem, Boltzmann machines make many small updates to their weights, and each update requires them to solve many different search problems.
- QUOTE: A Boltzmann machine is a network of symmetrically connected, neuron-like units that make stochastic decisions about whether to be on or off. Boltzmann machines have a simple learning algorithm (Hinton and Sejnowski 1983 [1]) that allows them to discover interesting features that represent complex regularities in the training data. The learning algorithm is very slow in networks with many layers of feature detectors, but it is fast in “restricted Boltzmann machines” that have a single layer of feature detectors. Many hidden layers can be learned efficiently by composing restricted Boltzmann machines, using the feature activations of one as the training data for the next.
2017b
- (The Asimov Institute, 2017) ⇒ http://asimovinstitute.org/neural-network-zoo/
- QUOTE: Boltzmann machines (BM) are a lot like HNs, but some neurons are marked as input neurons and others remain “hidden”. The input neurons become output neurons at the end of a full network update. It starts with random weights and learns through back-propagation, or more recently through contrastive divergence (a Markov chain is used to determine the gradients between two informational gains). Compared to a HN, the neurons mostly have binary activation patterns. As hinted by being trained by MCs, BMs are stochastic networks. The training and running process of a BM is fairly similar to a HN: one sets the input neurons to certain clamped values after which the network is set free (it doesn’t get a sock). While free the cells can get any value and we repetitively go back and forth between the input and hidden neurons. The activation is controlled by a global temperature value, which if lowered lowers the energy of the cells. This lower energy causes their activation patterns to stabilise. The network reaches an equilibrium given the right temperature.
- QUOTE: Boltzmann machines (BM) are a lot like HNs, but some neurons are marked as input neurons and others remain “hidden”. The input neurons become output neurons at the end of a full network update. It starts with random weights and learns through back-propagation, or more recently through contrastive divergence (a Markov chain is used to determine the gradients between two informational gains). Compared to a HN, the neurons mostly have binary activation patterns. As hinted by being trained by MCs, BMs are stochastic networks. The training and running process of a BM is fairly similar to a HN: one sets the input neurons to certain clamped values after which the network is set free (it doesn’t get a sock). While free the cells can get any value and we repetitively go back and forth between the input and hidden neurons. The activation is controlled by a global temperature value, which if lowered lowers the energy of the cells. This lower energy causes their activation patterns to stabilise. The network reaches an equilibrium given the right temperature.
2014
- (Wikipedia, 2014) ⇒ http://en.wikipedia.org/wiki/Boltzmann_machine Retrieved:2014-9-23.
- A Boltzmann machine is a type of stochastic recurrent neural network invented by Geoffrey Hinton and Terry Sejnowski in 1985. Boltzmann machines can be seen as the stochastic, generative counterpart of Hopfield nets. They were one of the first examples of a neural network capable of learning internal representations, and are able to represent and (given sufficient time) solve difficult combinatoric problems. However, due to a number of issues discussed below, Boltzmann machines with unconstrained connectivity have not proven useful for practical problems in machine learning or inference. They are still theoretically intriguing, however, due to the locality and Hebbian nature of their training algorithm, as well as their parallelism and the resemblance of their dynamics to simple physical processes. If the connectivity is constrained, the learning can be made efficient enough to be useful for practical problems.
They are named after the Boltzmann distribution in statistical mechanics, which is used in their sampling function.
- A Boltzmann machine is a type of stochastic recurrent neural network invented by Geoffrey Hinton and Terry Sejnowski in 1985. Boltzmann machines can be seen as the stochastic, generative counterpart of Hopfield nets. They were one of the first examples of a neural network capable of learning internal representations, and are able to represent and (given sufficient time) solve difficult combinatoric problems. However, due to a number of issues discussed below, Boltzmann machines with unconstrained connectivity have not proven useful for practical problems in machine learning or inference. They are still theoretically intriguing, however, due to the locality and Hebbian nature of their training algorithm, as well as their parallelism and the resemblance of their dynamics to simple physical processes. If the connectivity is constrained, the learning can be made efficient enough to be useful for practical problems.
2012
- (Deeplearning.net, 2012) ⇒ http://deeplearning.net/tutorial/rbm.html
- QUOTE: Boltzmann Machines (BMs) are a particular form of log-linear Markov Random Field (MRF), i.e., for which the energy function is linear in its free parameters. To make them powerful enough to represent complicated distributions (i.e., go from the limited parametric setting to a non-parametric one), we consider that some of the variables are never observed (they are called hidden). By having more hidden variables (also called hidden units), we can increase the modeling capacity of the Boltzmann Machine (BM). Restricted Boltzmann Machines further restrict BMs to those without visible-visible and hidden-hidden connections. A graphical depiction of an RBM is shown below.
1989
- C.-Y. Liou, and S.-L. Lin. (1989). “The other variant Boltzmann machine". In: Proceedings of the International Joint Conference on Neural Networks. doi:10.1109/IJCNN.1989.118618.
- ABSTRACT: A novel theory for studying the learning behavior of a neural network which is formed by interconnecting neurons is presented. This learning theory constitutes a new approach to the Boltzmann machine. The central idea is to minimize one of the two cross-entropy-like criteria, the cross-entropy and the reversed cross-entropy; the latter is used by Ackley et al. (Cognitive Sci., vol.9, p.147-59, 1985) in deriving the Boltzmann machine. The results derived by the present approach are closely related to those obtained by Ackley et al., with several significant modifications in the algorithm. A detailed discussion of the new algorithm, which is shown to be a probability-weighted version of the algorithm by Ackley et al., is presented.
1986
- (Hinton & Sejnowski, 1986) ⇒ Hinton, G. E., & Sejnowski, T. J. (1986). "Learning and releaming in boltzmann machines" (PDF). Parallel distributed processing: Explorations in the microstructure of cognition, 1(282-317), 2.
- QUOTE: Many of the chapters in this volume make use of the ability of a parallel network to perform cooperative searches for good solutions to problems. The basic idea is simple: The weights on the connections between processing units encode knowledge about how things normally (...)
1985
- (Ackley et al., 1985) ⇒ David H. Ackley, Geoffrey E. Hinton, and Terrence J. Sejnowski. (1985). “A Learning Algorithm for Boltzmann Machines". In: Cognitive Science, 9(1). doi:10.1207/s15516709cog0901_7.
- ABSTRACT: The computational power of massively parallel networks of simple processing elements resides in the communication bandwidth provided by the hardware connections between elements. These connections can allow a significant fraction of the knowledge of the system to be applied to an instance of a problem in a very short time. One kind of computation for which massively parallel networks appear to be well suited is large constraint satisfaction searches, but to use the connections efficiently two conditions must be met: First, a search technique that is suitable for parallel networks must be found. Second, there must be some way of choosing internal representations which allow the preexisting hardware connections to be used efficiently for encoding the constraints in the domain being searched. We describe a general parallel search method, based on statistical mechanics, and we show how it leads to a general learning rule for modifying the connection strengths so as to incorporate knowledge about a task domain in an efficient way. We describe some simple examples in which the learning algorithm creates internal representations that are demonstrably the most efficient way of using the preexisting connectivity structure.
- ↑ Hinton GE, Sejnowski TJ (1983) “Optimal perceptual inference” (PDF). In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Washington, DC, pp 448–453