Relevance Vector Machine (RVM) Algorithm
A Relevance Vector Machine (RVM) Algorithm is a probabilistic supervised learning algorithm that uses Bayesian inference ...
- Context:
- It can range from being a Relevance Vector Machine Regression Algorithm to being a Relevance Vector Machine Classification Algorithm.
- It can implement Expectation Maximization and Sequential Minimal Optimization Algorithms.
- Example(s):
- Counter-Example(s):
- See: Bayesian Analysis, Automatic Relevance Determination, Sparse Bayesian Learning System, Sparse Bayesian Regression, Machine Learning, Bayesian Inference, Occam's Razor, Regression Analysis, Probabilistic Classification, Journal of Machine Learning Research, Support Vector Machine, Gaussian Process, Covariance Function, Kernel Function,----
References
2019
- (Wikipedia, 2019) ⇒ https://en.wikipedia.org/wiki/Relevance_vector_machine Retrieved:2019-10-4.
- In mathematics, a Relevance Vector Machine (RVM) is a machine learning technique that uses Bayesian inference to obtain parsimonious solutions for regression and probabilistic classification. The RVM has an identical functional form to the support vector machine, but provides probabilistic classification. It is actually equivalent to a Gaussian process model with covariance function:
[math]\displaystyle{ k(\mathbf{x},\mathbf{x'}) = \sum_{j=1}^N \frac{1}{\alpha_j} \varphi(\mathbf{x},\mathbf{x}_j)\varphi(\mathbf{x}',\mathbf{x}_j) }[/math]
where [math]\displaystyle{ \varphi }[/math] is the kernel function (usually Gaussian), [math]\displaystyle{ \alpha_j }[/math] are the variances of the prior on the weight vector [math]\displaystyle{ w \sim N(0,\alpha^{-1}I) }[/math] , and [math]\displaystyle{ \mathbf{x}_1,\ldots,\mathbf{x}_N }[/math] are the input vectors of the training set. Compared to that of support vector machines (SVM), the Bayesian formulation of the RVM avoids the set of free parameters of the SVM (that usually require cross-validation-based post-optimizations). However RVMs use an expectation maximization (EM)-like learning method and are therefore at risk of local minima. This is unlike the standard sequential minimal optimization (SMO)-based algorithms employed by SVMs, which are guaranteed to find a global optimum (of the convex problem). The relevance vector machine is patented in the United States by Microsoft.
- In mathematics, a Relevance Vector Machine (RVM) is a machine learning technique that uses Bayesian inference to obtain parsimonious solutions for regression and probabilistic classification. The RVM has an identical functional form to the support vector machine, but provides probabilistic classification. It is actually equivalent to a Gaussian process model with covariance function:
2017
- (Jia et al.,2017) ⇒ Yuheng Jia , Sam Kwong, Wenhui Wu, Wei Gao, and Ran Wang (2017, September). "Generalized Relevance Vector Machine". In 2017 Intelligent Systems Conference (IntelliSys) (pp. 638-645). IEEE.
- QUOTE: This paper considers the generalized version of relevance vector machine (RVM), which is a sparse Bayesian kernel machine for classification and ordinary regression. Generalized RVM (GRVM) follows the work of generalized linear model (GLM), which is a natural generalization of ordinary linear regression model and shares a common approach to estimate the parameters. GRVM inherits the advantages of GLM, i.e., unified model structure, same training algorithm, and convenient task-specific model design. It also inherits the advantages of RVM, i.e., probabilistic output, extremely sparse solution, hyperparameter auto-estimation. Besides, GRVM extends RVM to a wider range of learning tasks beyond classification and ordinary regression by assuming that the conditional output belongs to exponential family distribution (EFD). Since EFD results in inference intractable problem in Bayesian analysis, in this paper, Laplace approximation is adopted to solve this problem, which is a common approach in Bayesian inference. Further, several task-specific models are designed based on GRVM including models for ordinary regression, count data regression, classification, ordinal regression, etc. Besides, the relationship between GRVM and traditional RVM models are discussed (...)
2010
- (Saarela et al., 2010) ⇒ Matti Saarela, Tapio Elomaa, and Keijo Ruohonen (2010). "An analysis of relevance vector machine regression". In Advances in Machine Learning I (pp. 227-246). Springer, Berlin, Heidelberg. DOI: 10.1007/978-3-642-05177-7_11. ISBN: 978-3-642-05177-7.
- QUOTE: The relevance vector machine (RVM) is a Bayesian framework for learning sparse regression models and classifiers. Despite of its popularity and practical success, no thorough analysis of its functionality exists. In this paper we consider the RVM in the case of regression models and present two kinds of analysis results: we derive a full characterization of the behavior of the RVM analytically when the columns of the regression matrix are orthogonal and give some results concerning scale and rotation invariance of the RVM. We also consider the practical implications of our results and present a scenario in which our results can be used to detect potential weakness in the RVM framework.
2006
- (Tzikas et al., 2006) ⇒ Dimitris Tzikas, Liyang Wei, Aristidis Likas, Yongyi Yang, and Nikolas P. Galatsanos (2006). "A Tutorial On Relevance Vector Machines For Regression And Classification With Applications".
- QUOTE: Relevance vector machines (RVM) have recently attracted much interest in the research community because they provide a number of advantages. They are based on a Bayesian formulation of a linear model with an appropriate prior that results in a sparse representation. As a consequence, they can generalize well and provide inferences at low computational cost. In this tutorial we first present the basic theory of RVM for regression and classification, followed by two examples illustrating the application of RVM for object detection and classification (...)
Relevance vector machine (RVM) is a special case of a sparse linear model, where the basis functions are formed by a kernel function [math]\displaystyle{ \phi }[/math] centred at the different training points:
[math]\displaystyle{ y(x)=\displaystyle \sum_{i=1}^N w_i\phi(x-x_i) }[/math]While this model is similar in form to the support vector machines (SVM), the kernel function here does not need to satisfy the Mercer’s condition, which requires [math]\displaystyle{ \phi }[/math] to be a continuous symmetric kernel of a positive integral operator.
Multi-kernel RVM is an extension of the simple RVM model. It consists of several different types of kernels [math]\displaystyle{ \phi_m }[/math] , given by:
[math]\displaystyle{ y(x)=\displaystyle \sum_{m=1}^M \sum_{i=1}^N w_{m,i}\phi_m(x-x_i) }[/math]The sparseness property enables automatic selection of the proper kernel at each location by pruning all irrelevant kernels, though it is possible that two different kernels remain on the same location.
- QUOTE: Relevance vector machines (RVM) have recently attracted much interest in the research community because they provide a number of advantages. They are based on a Bayesian formulation of a linear model with an appropriate prior that results in a sparse representation. As a consequence, they can generalize well and provide inferences at low computational cost. In this tutorial we first present the basic theory of RVM for regression and classification, followed by two examples illustrating the application of RVM for object detection and classification (...)
2005
- (Rasmussen & Quinonero-Candela, 2005) ⇒ Carl Edward Rasmussen, and Joaquin Quinonero-Candela (2005, August). "Healing the Relevance Vector Machine Through Augmentation". In: Proceedings of the 22nd International Conference on Machine learning (pp. 689-696). ACM.
- QUOTE: The Relevance Vector Machine (RVM) introduced by Tipping (2001) produces sparse solutions using an improper hierarchical prior and optimizing over hyperparameters. The RVM is exactly equivalent to a Gaussian Process, where the RVM hyperparameters are parameters of the GP covariance function (more on this in the discussion section). However, the covariance function of the RVM seen as a GP is degenerate: its rank is at most equal to the number of relevance vectors of the RVM. As a consequence, for localized basis functions, the RVM produces predictive distributions with properties opposite to what would be desirable. Indeed, the RVM is more certain about its predictions the further one moves away from the data it has been trained on. One would wish the opposite behaviour, as is the case with non-degenerate GPs, where the uncertainty of the predictions is minimal for test points in the regions of the input space where (training) data has been seen. For non-localized basis functions, the same undesired effect persists, although the intuition may be less clear, see the discussion.
2004
- (Bishop, 2004) ⇒ Christopher M. Bishop. (2004). “Recent Advances in Bayesian Inference Techniques." Keynote Presentation at SIAM Conference on Data Mining.
- Relevance Vector Machine (Tipping, 1999)
- Bayesian alternative to support vector machine (SVM)
- Properties
- comparable error rates to SVM on new data
- no cross-validation to set complexity parameters
- applicable to wide choice of basis function
- multi-class classification
- probabilistic outputs
- dramatically fewer kernels (by an order of magnitude)
- but, slower to train than SVM
- Relevance Vector Machine (Tipping, 1999)
2001
- (Tipping, 2001) ⇒ Michael E. Tipping (2001). "Sparse Bayesian Learning and the Relevance Vector Machine". Journal of machine learning research, 1(Jun), 211-244.
- QUOTE: Specifically, we adopt a fully probabilistic framework and introduce a prior over the model weights governed by a set of hyperparameters, one associated with each weight, whose most probable values are iteratively estimated from the data. Sparsity is achieved because in practice we find that the posterior distributions of many of the weights are sharply (indeed infinitely) peaked around zero. We term those training vectors associated with the remaining non-zero weights `relevance' vectors, in deference to the principle of automatic relevance determination which motivates the presented approach (MacKay, 1994; Neal, 1996). The most compelling feature of the RVM is that, while capable of generalisation performance comparable to an equivalent SVM, it typically utilises dramatically fewer kernel functions
2000a
- (Bishop & Tipping, 2000) ⇒ Christopher M. Bishop, and Michael E. Tipping (2000). "Variational Relevance Vector Machines". Proceedings of the Sixteenth conference on Uncertainty in artificial intelligence. Morgan Kaufmann Publishers Inc., 2000.
- QUOTE: Recently Tipping [8] introduced the Relevance Vector Machine (RVM) which makes probabilistic predictions and yet which retains the excellent predictive performance of the support vector machine. It also preserves the sparseness property of the SVM. Indeed, for a wide variety of test problems it actually leads to models which are dramatically sparser than the corresponding SVM, while sacrificing little if anything in the accuracy of prediction (...)
As we have seen, the standard relevance vector machine of Tipping [8] estimates point values for the hyperparameters. In this paper we seek a more complete Bayesian treatment of the RVM through exploitation of variational methods.
- QUOTE: Recently Tipping [8] introduced the Relevance Vector Machine (RVM) which makes probabilistic predictions and yet which retains the excellent predictive performance of the support vector machine. It also preserves the sparseness property of the SVM. Indeed, for a wide variety of test problems it actually leads to models which are dramatically sparser than the corresponding SVM, while sacrificing little if anything in the accuracy of prediction (...)
2000b
- (Tipping, 2000) ⇒ Michael E. Tipping (2000). "The Relevance Vector Machine". In Advances in Neural Information Processing Systems (pp. 652-658).
- QUOTE: In this paper, we introduce the relevance vector machine (RVM), a probabilistic sparse kernel model identical in functional form to the SVM. Here we adopt a Bayesian approach to learning, where we introduce a prior over the weights governed by a set of hyperparameters, one associated with each weight, whose most probable values are iteratively estimated from the data. Sparsity is achieved because in practice we find that the posterior distributions of many of the weights are sharply peaked around zero. Furthermore, unlike the support vector classifier, the nonzero weights in the RVM are not associated with examples close to the decision boundary, but rather appear to represent 'prototypical' examples of classes. We term these examples 'relevance' vectors, in deference to the principle of automatic relevance determination (ARD) which motivates the presented approach [1] [2].