Relevance Vector Machine (RVM) Algorithm: Difference between revisions

Latest revision as of 17:07, 1 June 2024

A Relevance Vector Machine (RVM) Algorithm is a probabilistic supervised learning algorithm that uses Bayesian inference ...

Context:
- It can range from being a Relevance Vector Machine Regression Algorithm to being a Relevance Vector Machine Classification Algorithm.
- It can implement Expectation Maximization and Sequential Minimal Optimization Algorithms.
Example(s):
- a Generalized Relevance Vector Machine (Jia et al.,2017),
- a Variational Relevance Vector Machine (Bishop & Tipping, 2000),
- a Multi-Kernel Relevance Vector Machine (e.g. Tzikas et al., 2006),
- …
Counter-Example(s):
- a Support Vector Machine.
See: Bayesian Analysis, Automatic Relevance Determination, Sparse Bayesian Learning System, Sparse Bayesian Regression, Machine Learning, Bayesian Inference, Occam's Razor, Regression Analysis, Probabilistic Classification, Journal of Machine Learning Research, Support Vector Machine, Gaussian Process, Covariance Function, Kernel Function,----

References

2019

(Wikipedia, 2019) ⇒ https://en.wikipedia.org/wiki/Relevance_vector_machine Retrieved:2019-10-4.
- In mathematics, a Relevance Vector Machine (RVM) is a machine learning technique that uses Bayesian inference to obtain parsimonious solutions for regression and probabilistic classification. The RVM has an identical functional form to the support vector machine, but provides probabilistic classification. It is actually equivalent to a Gaussian process model with covariance function:
  [math]\displaystyle{ k(\mathbf{x},\mathbf{x'}) = \sum_{j=1}^N \frac{1}{\alpha_j} \varphi(\mathbf{x},\mathbf{x}_j)\varphi(\mathbf{x}',\mathbf{x}_j) }[/math]
  where [math]\displaystyle{ \varphi }[/math] is the kernel function (usually Gaussian), [math]\displaystyle{ \alpha_j }[/math] are the variances of the prior on the weight vector [math]\displaystyle{ w \sim N(0,\alpha^{-1}I) }[/math] , and [math]\displaystyle{ \mathbf{x}_1,\ldots,\mathbf{x}_N }[/math] are the input vectors of the training set. Compared to that of support vector machines (SVM), the Bayesian formulation of the RVM avoids the set of free parameters of the SVM (that usually require cross-validation-based post-optimizations). However RVMs use an expectation maximization (EM)-like learning method and are therefore at risk of local minima. This is unlike the standard sequential minimal optimization (SMO)-based algorithms employed by SVMs, which are guaranteed to find a global optimum (of the convex problem). The relevance vector machine is patented in the United States by Microsoft.

2017

(Jia et al.,2017) ⇒ Yuheng Jia , Sam Kwong, Wenhui Wu, Wei Gao, and Ran Wang (2017, September). "Generalized Relevance Vector Machine". In 2017 Intelligent Systems Conference (IntelliSys) (pp. 638-645). IEEE.
- QUOTE: This paper considers the generalized version of relevance vector machine (RVM), which is a sparse Bayesian kernel machine for classification and ordinary regression. Generalized RVM (GRVM) follows the work of generalized linear model (GLM), which is a natural generalization of ordinary linear regression model and shares a common approach to estimate the parameters. GRVM inherits the advantages of GLM, i.e., unified model structure, same training algorithm, and convenient task-specific model design. It also inherits the advantages of RVM, i.e., probabilistic output, extremely sparse solution, hyperparameter auto-estimation. Besides, GRVM extends RVM to a wider range of learning tasks beyond classification and ordinary regression by assuming that the conditional output belongs to exponential family distribution (EFD). Since EFD results in inference intractable problem in Bayesian analysis, in this paper, Laplace approximation is adopted to solve this problem, which is a common approach in Bayesian inference. Further, several task-specific models are designed based on GRVM including models for ordinary regression, count data regression, classification, ordinal regression, etc. Besides, the relationship between GRVM and traditional RVM models are discussed (...)

2010

(Saarela et al., 2010) ⇒ Matti Saarela, Tapio Elomaa, and Keijo Ruohonen (2010). "An analysis of relevance vector machine regression". In Advances in Machine Learning I (pp. 227-246). Springer, Berlin, Heidelberg. DOI: 10.1007/978-3-642-05177-7_11. ISBN: 978-3-642-05177-7.
- QUOTE: The relevance vector machine (RVM) is a Bayesian framework for learning sparse regression models and classifiers. Despite of its popularity and practical success, no thorough analysis of its functionality exists. In this paper we consider the RVM in the case of regression models and present two kinds of analysis results: we derive a full characterization of the behavior of the RVM analytically when the columns of the regression matrix are orthogonal and give some results concerning scale and rotation invariance of the RVM. We also consider the practical implications of our results and present a scenario in which our results can be used to detect potential weakness in the RVM framework.

2006

(Tzikas et al., 2006) ⇒ Dimitris Tzikas, Liyang Wei, Aristidis Likas, Yongyi Yang, and Nikolas P. Galatsanos (2006). "A Tutorial On Relevance Vector Machines For Regression And Classification With Applications".
- QUOTE: Relevance vector machines (RVM) have recently attracted much interest in the research community because they provide a number of advantages. They are based on a Bayesian formulation of a linear model with an appropriate prior that results in a sparse representation. As a consequence, they can generalize well and provide inferences at low computational cost. In this tutorial we first present the basic theory of RVM for regression and classification, followed by two examples illustrating the application of RVM for object detection and classification (...)
  Relevance vector machine (RVM) is a special case of a sparse linear model, where the basis functions are formed by a kernel function [math]\displaystyle{ \phi }[/math] centred at the different training points:
  [math]\displaystyle{ y(x)=\displaystyle \sum_{i=1}^N w_i\phi(x-x_i) }[/math]
  While this model is similar in form to the support vector machines (SVM), the kernel function here does not need to satisfy the Mercer’s condition, which requires [math]\displaystyle{ \phi }[/math] to be a continuous symmetric kernel of a positive integral operator.
  Multi-kernel RVM is an extension of the simple RVM model. It consists of several different types of kernels [math]\displaystyle{ \phi_m }[/math] , given by:
  [math]\displaystyle{ y(x)=\displaystyle \sum_{m=1}^M \sum_{i=1}^N w_{m,i}\phi_m(x-x_i) }[/math]
  The sparseness property enables automatic selection of the proper kernel at each location by pruning all irrelevant kernels, though it is possible that two different kernels remain on the same location.

2005

(Rasmussen & Quinonero-Candela, 2005) ⇒ Carl Edward Rasmussen, and Joaquin Quinonero-Candela (2005, August). "Healing the Relevance Vector Machine Through Augmentation". In: Proceedings of the 22nd International Conference on Machine learning (pp. 689-696). ACM.
- QUOTE: The Relevance Vector Machine (RVM) introduced by Tipping (2001) produces sparse solutions using an improper hierarchical prior and optimizing over hyperparameters. The RVM is exactly equivalent to a Gaussian Process, where the RVM hyperparameters are parameters of the GP covariance function (more on this in the discussion section). However, the covariance function of the RVM seen as a GP is degenerate: its rank is at most equal to the number of relevance vectors of the RVM. As a consequence, for localized basis functions, the RVM produces predictive distributions with properties opposite to what would be desirable. Indeed, the RVM is more certain about its predictions the further one moves away from the data it has been trained on. One would wish the opposite behaviour, as is the case with non-degenerate GPs, where the uncertainty of the predictions is minimal for test points in the regions of the input space where (training) data has been seen. For non-localized basis functions, the same undesired effect persists, although the intuition may be less clear, see the discussion.

2004

(Bishop, 2004) ⇒ Christopher M. Bishop. (2004). “Recent Advances in Bayesian Inference Techniques." Keynote Presentation at SIAM Conference on Data Mining.
- Relevance Vector Machine (Tipping, 1999)
  - Bayesian alternative to support vector machine (SVM)
  - Properties
    - comparable error rates to SVM on new data
    - no cross-validation to set complexity parameters
    - applicable to wide choice of basis function
    - multi-class classification
    - probabilistic outputs
    - dramatically fewer kernels (by an order of magnitude)
    - but, slower to train than SVM

2001

(Tipping, 2001) ⇒ Michael E. Tipping (2001). "Sparse Bayesian Learning and the Relevance Vector Machine". Journal of machine learning research, 1(Jun), 211-244.
- QUOTE: Specifically, we adopt a fully probabilistic framework and introduce a prior over the model weights governed by a set of hyperparameters, one associated with each weight, whose most probable values are iteratively estimated from the data. Sparsity is achieved because in practice we find that the posterior distributions of many of the weights are sharply (indeed infinitely) peaked around zero. We term those training vectors associated with the remaining non-zero weights `relevance' vectors, in deference to the principle of automatic relevance determination which motivates the presented approach (MacKay, 1994; Neal, 1996). The most compelling feature of the RVM is that, while capable of generalisation performance comparable to an equivalent SVM, it typically utilises dramatically fewer kernel functions

2000a

(Bishop & Tipping, 2000) ⇒ Christopher M. Bishop, and Michael E. Tipping (2000). "Variational Relevance Vector Machines". Proceedings of the Sixteenth conference on Uncertainty in artificial intelligence. Morgan Kaufmann Publishers Inc., 2000.
- QUOTE: Recently Tipping [8] introduced the Relevance Vector Machine (RVM) which makes probabilistic predictions and yet which retains the excellent predictive performance of the support vector machine. It also preserves the sparseness property of the SVM. Indeed, for a wide variety of test problems it actually leads to models which are dramatically sparser than the corresponding SVM, while sacrificing little if anything in the accuracy of prediction (...)
  As we have seen, the standard relevance vector machine of Tipping [8] estimates point values for the hyperparameters. In this paper we seek a more complete Bayesian treatment of the RVM through exploitation of variational methods.

2000b

(Tipping, 2000) ⇒ Michael E. Tipping (2000). "The Relevance Vector Machine". In Advances in Neural Information Processing Systems (pp. 652-658).
- QUOTE: In this paper, we introduce the relevance vector machine (RVM), a probabilistic sparse kernel model identical in functional form to the SVM. Here we adopt a Bayesian approach to learning, where we introduce a prior over the weights governed by a set of hyperparameters, one associated with each weight, whose most probable values are iteratively estimated from the data. Sparsity is achieved because in practice we find that the posterior distributions of many of the weights are sharply peaked around zero. Furthermore, unlike the support vector classifier, the nonzero weights in the RVM are not associated with examples close to the decision boundary, but rather appear to represent 'prototypical' examples of classes. We term these examples 'relevance' vectors, in deference to the principle of automatic relevance determination (ARD) which motivates the presented approach ^[1] ^[2].

↑ D. J. C. Mackay. Bayesian non-linear modelling for the prediction competition. In ASHRAE Transactions, vol. 100, pages 1053- 1062. ASHRAE, Atlanta, Georgia, 1994.
↑ R. M. Neal. Bayesian Learning for Neural Networks. Springer, New York, 1996

[ref4-1] D. J. C. Mackay. Bayesian non-linear modelling for the prediction competition. In ASHRAE Transactions, vol. 100, pages 1053- 1062. ASHRAE, Atlanta, Georgia, 1994.

[ref6-2] R. M. Neal. Bayesian Learning for Neural Networks. Springer, New York, 1996

[1]

[2]

@@ Line 1: / Line 1: @@
-A [[Relevance Vector Machine (RVM) Algorithm]] is a [[probabilistic model|probabilistic]] [[supervised learning algorithm]] that uses [[Bayesian inference]]...
+A [[Relevance Vector Machine (RVM) Algorithm]] is a [[probabilistic model|probabilistic]] [[supervised learning algorithm]] that uses [[Bayesian inference]] ...
 * <B>Context:</B>
 ** It can range from being a [[Relevance Vector Machine Regression Algorithm]] to being a [[Relevance Vector Machine Classification Algorithm]].
@@ Line 6: / Line 6: @@
 ** a [[Generalized Relevance Vector Machine]] ([[#2017|Jia et al.,2017]]),
 ** a [[Variational Relevance Vector Machine]] ([[#200a|Bishop & Tipping, 2000]]),
-** a [[Multi-Kernel Relevance Vector Machine]] (e.g. [[#2006|Tzikas et al., 2006]]).
+** a [[Multi-Kernel Relevance Vector Machine]] (e.g. [[#2006|Tzikas et al., 2006]]),
+** …
 * <B>Counter-Example(s):</B>
 ** a [[Support Vector Machine]].
-* <B>See:</B> [[Bayesian Analysis]], [[Automatic Relevance Determination]], [[Sparse Bayesian Learning System]], [[Sparse Bayesian Regression]], [[Machine Learning]], [[Bayesian Inference]], [[Occam's Razor]], [[Regression Analysis]], [[Probabilistic Classification]], [[Journal of Machine Learning Research]], [[Support Vector Machine]], [[Gaussian Process]], [[Covariance Function]], [[Kernel Function]],
+* <B>See:</B> [[Bayesian Analysis]], [[Automatic Relevance Determination]], [[Sparse Bayesian Learning System]], [[Sparse Bayesian Regression]], [[Machine Learning]], [[Bayesian Inference]], [[Occam's Razor]], [[Regression Analysis]], [[Probabilistic Classification]], [[Journal of Machine Learning Research]], [[Support Vector Machine]], [[Gaussian Process]], [[Covariance Function]], [[Kernel Function]],----
-----
 ----
@@ Line 17: / Line 17: @@
 === 2019 ===
 * (Wikipedia, 2019) ⇒ https://en.wikipedia.org/wiki/Relevance_vector_machine Retrieved:2019-10-4.
-** In [[mathematics]], a '''Relevance Vector Machine (RVM)''' is a [[machine learning]] technique that uses [[Bayesian inference]] to obtain [[Occam's razor|parsimonious]] solutions for [[Regression analysis|regression]] and [[probabilistic classification]].  The RVM has an identical functional form to the [[support vector machine]], but provides probabilistic classification. It is actually equivalent to a [[Gaussian process]] model with [[covariance function]]: <P><math> k(\mathbf{x},\mathbf{x'}) = \sum_{j=1}^N \frac{1}{\alpha_j} \varphi(\mathbf{x},\mathbf{x}_j)\varphi(\mathbf{x}',\mathbf{x}_j) </math> <P>where <math> \varphi </math> is the [[kernel function]] (usually Gaussian), <math> \alpha_j </math> are the variances of the prior on the weight vector <math> w \sim N(0,\alpha^{-1}I) </math> , and <math> \mathbf{x}_1,\ldots,\mathbf{x}_N </math> are the input vectors of the [[training set]].  Compared to that of [[support vector machine]]s (SVM), the Bayesian formulation of the RVM avoids the set of free parameters of the SVM (that usually require cross-validation-based post-optimizations). However RVMs use an [[expectation maximization]] (EM)-like learning method and are therefore at risk of local minima. This is unlike the standard [[sequential minimal optimization]] (SMO)-based algorithms employed by [[Support vector machine|SVM]]s, which are guaranteed to find a global optimum (of the convex problem). The relevance vector machine is [[Software patents under United States patent law|patented in the United States]] by [[Microsoft]].
+** In [[mathematics]], a '''Relevance Vector Machine (RVM)''' is a [[machine learning]] technique that uses [[Bayesian inference]] to obtain [[Occam's razor|parsimonious]] solutions for [[Regression analysis|regression]] and [[probabilistic classification]].  The RVM has an identical functional form to the [[support vector machine]], but provides probabilistic classification. It is actually equivalent to a [[Gaussian process]] model with [[covariance function]]:         <P><math> k(\mathbf{x},\mathbf{x'}) = \sum_{j=1}^N \frac{1}{\alpha_j} \varphi(\mathbf{x},\mathbf{x}_j)\varphi(\mathbf{x}',\mathbf{x}_j) </math>         <P>          where <math> \varphi </math> is the [[kernel function]] (usually Gaussian), <math> \alpha_j </math> are the variances of the prior on the weight vector <math> w \sim N(0,\alpha^{-1}I) </math> , and <math> \mathbf{x}_1,\ldots,\mathbf{x}_N </math> are the input vectors of the [[training set]].  Compared to that of [[support vector machine]]s (SVM), the Bayesian formulation of the RVM avoids the set of free parameters of the SVM (that usually require cross-validation-based post-optimizations). However RVMs use an [[expectation maximization]] (EM)-like learning method and are therefore at risk of local minima. This is unlike the standard [[sequential minimal optimization]] (SMO)-based algorithms employed by [[Support vector machine|SVM]]s, which are guaranteed to find a global optimum (of the convex problem). The relevance vector machine is [[Software patents under United States patent law|patented in the United States]] by [[Microsoft]].
 === 2017 ===
 * (Jia et al.,2017) ⇒ [[Yuheng Jia]] , [[Sam Kwong]], [[Wenhui Wu]], [[Wei Gao]], and [[Ran Wang]] (2017, September). [https://ieeexplore.ieee.org/document/8324361 "Generalized Relevance Vector Machine"]. In 2017 Intelligent Systems Conference (IntelliSys) (pp. 638-645). IEEE.
-** QUOTE: This paper considers the generalized version of [[relevance vector machine (RVM)]], which is a [[sparse Bayesian kernel machine]] for [[classification]] and ordinary [[regression]]. [[Generalized RVM (GRVM)]] follows the work of [[generalized linear model (GLM)]], which is a natural generalization of ordinary [[linear regression model]] and shares a common approach to [[estimate]] the [[parameter]]s. [[GRVM]] inherits the advantages of [[GLM]], i.e., [[unified model structure]], same [[training algorithm]], and convenient [[task-specific model design]]. It also inherits the advantages of [[RVM]], i.e., [[probabilistic output]], extremely [[sparse solution]], [[hyperparameter auto-estimation]]. Besides, [[GRVM]] extends [[RVM]] to a wider range of [[learning task]]s beyond [[classification]] and ordinary [[regression]] by assuming that the [[conditional output]] belongs to [[exponential family distribution (EFD)]]. Since [[EFD]] results in [[inference]] [[intractable problem]] in [[Bayesian analysis]], in this paper, [[Laplace approximation]] is adopted to solve this problem, which is a common approach in [[Bayesian inference]]. Further, several [[task-specific model]]s are designed based on [[GRVM]] including [[model]]s for ordinary [[regression]], [[count data regression]], [[classification]], [[ordinal regression]], etc. Besides, the relationship between [[GRVM]] and traditional [[RVM]] models are discussed (...)
+** QUOTE: This paper considers the generalized version of [[Relevance Vector Machine (RVM) Algorithm|relevance vector machine (RVM)]], which is a [[sparse Bayesian kernel machine]] for [[classification]] and ordinary [[regression]]. [[Generalized RVM (GRVM)]] follows the work of [[generalized linear model (GLM)]], which is a natural generalization of ordinary [[linear regression model]] and shares a common approach to [[estimate]] the [[parameter]]s. [[GRVM]] inherits the advantages of [[GLM]], i.e., [[unified model structure]], same [[training algorithm]], and convenient [[task-specific model design]]. It also inherits the advantages of [[Relevance Vector Machine (RVM) Algorithm|RVM]], i.e., [[probabilistic output]], extremely [[sparse solution]], [[hyperparameter auto-estimation]]. Besides, [[GRVM]] extends [[Relevance Vector Machine (RVM) Algorithm|RVM]] to a wider range of [[learning task]]s beyond [[classification]] and ordinary [[regression]] by assuming that the [[conditional output]] belongs to [[exponential family distribution (EFD)]]. Since [[EFD]] results in [[inference]] [[intractable problem]] in [[Bayesian analysis]], in this paper, [[Laplace approximation]] is adopted to solve this problem, which is a common approach in [[Bayesian inference]]. Further, several [[task-specific model]]s are designed based on [[GRVM]] including [[model]]s for ordinary [[regression]], [[count data regression]], [[classification]], [[ordinal regression]], etc. Besides, the relationship between [[GRVM]] and traditional [[Relevance Vector Machine (RVM) Algorithm|RVM]] models are discussed (...)
 === 2010 ===
 * ([[Saarela et al., 2010]]) ⇒ [[Matti Saarela]], [[Tapio Elomaa]], and [[Keijo Ruohonen]] (2010). [https://link.springer.com/chapter/10.1007%2F978-3-642-05177-7_11 "An analysis of relevance vector machine regression"]. In Advances in Machine Learning I (pp. 227-246). Springer, Berlin, Heidelberg. [https://doi.org/10.1007/978-3-642-05177-7_11 DOI: 10.1007/978-3-642-05177-7_11]. ISBN: 978-3-642-05177-7.
-** QUOTE: The [[relevance vector machine (RVM)]] is a [[Bayesian framework]] for [[learning sparse regression model]]s and [[classifier]]s. Despite of its popularity and practical success, no thorough [[analysis]] of its functionality exists. In this paper we consider the [[RVM]] in the case of [[regression model]]s and present two kinds of [[analysis]] results: we derive a full characterization of the behavior of the [[RVM]] analytically when the columns of the [[regression matrix]] are orthogonal and give some results concerning scale and [[rotation invariance]] of the [[RVM]]. We also consider the practical implications of our results and present a scenario in which our results can be used to detect potential weakness in the [[RVM]] framework.
+** QUOTE: The [[Relevance Vector Machine (RVM) Algorithm|relevance vector machine (RVM)]] is a [[Bayesian framework]] for [[learning sparse regression model]]s and [[classifier]]s. Despite of its popularity and practical success, no thorough [[analysis]] of its functionality exists. In this paper we consider the [[Relevance Vector Machine (RVM) Algorithm|RVM]] in the case of [[regression model]]s and present two kinds of [[analysis]] results: we derive a full characterization of the behavior of the [[Relevance Vector Machine (RVM) Algorithm|RVM]] analytically when the columns of the [[regression matrix]] are orthogonal and give some results concerning scale and [[rotation invariance]] of the [[Relevance Vector Machine (RVM) Algorithm|RVM]]. We also consider the practical implications of our results and present a scenario in which our results can be used to detect potential weakness in the [[Relevance Vector Machine (RVM) Algorithm|RVM]] framework.
 === 2006 ===
 * ([[Tzikas et al., 2006]]) ⇒ [[Dimitris Tzikas]], [[Liyang Wei]], [[Aristidis Likas]], [[Yongyi Yang]], and [[Nikolas P. Galatsanos]] (2006). [https://pdfs.semanticscholar.org/0a97/b301151cc6fd75c3a5ef6d1f0838e8714f5c.pdf?_ga=2.213211476.1958125896.1570161638-1669716821.1555811252 "A Tutorial On Relevance Vector Machines For Regression And Classification With Applications"].
-** QUOTE: [[Relevance vector machines (RVM)]] have recently attracted much interest in the [[research community]] because they provide a number of advantages. They are based on a [[Bayesian formulation]] of a [[linear model]] with an appropriate [[prior]] that results in a [[sparse representation]]. As a consequence, they can generalize well and provide inferences at low [[computational cost]]. In this [[tutorial]] we first present the basic theory of [[RVM Regression System|RVM for regression]] and [[RVM Classification System|classification]], followed by two examples illustrating the application of [[RVM]] for [[object detection]] and [[classification]] (...)<P>[[Relevance vector machine (RVM)]] is a special case of a [[sparse linear model]], where the [[basis function]]s are formed by a [[kernel function]] <math>\phi</math> centred at the different [[training point]]s:<P><div id="EQ6" style="text-align:center"><math>y(x)=\displaystyle \sum_{i=1}^N w_i\phi(x-x_i)</math></div><P>While this [[model]] is similar in form to the [[support vector machines (SVM)]], the [[kernel function]] here does not need to satisfy the [[Mercer’s condition]], which requires <math>\phi</math> to be a [[continuous symmetric kernel]] of a [[positive integral operator]].  <P>[[Multi-kernel RVM]] is an extension of the simple [[RVM model]]. It consists of several different types of [[Kernel Function|kernel]]s <math>\phi_m</math> , given by:<P><div id="EQ7" style="text-align:center"><math>y(x)=\displaystyle \sum_{m=1}^M \sum_{i=1}^N w_{m,i}\phi_m(x-x_i)</math></div><P> The [[sparseness property]] enables [[automatic selection]] of the proper [[Kernel Function|kernel]] at each location by [[Pruning Task|pruning]] all irrelevant [[kernel]]s, though it is possible that two different [[Kernel Function|kernel]]s remain on the same location.
+** QUOTE: [[Relevance Vector Machine (RVM) Algorithm|Relevance vector machines (RVM)]] have recently attracted much interest in the [[research community]] because they provide a number of advantages. They are based on a [[Bayesian formulation]] of a [[linear model]] with an appropriate [[prior]] that results in a [[sparse representation]]. As a consequence, they can generalize well and provide inferences at low [[computational cost]]. In this [[tutorial]] we first present the basic theory of [[RVM Regression System|RVM for regression]] and [[RVM Classification System|classification]], followed by two examples illustrating the application of [[Relevance Vector Machine (RVM) Algorithm|RVM]] for [[object detection]] and [[classification]] (...)         <P>          [[Relevance Vector Machine (RVM) Algorithm|Relevance vector machine (RVM)]] is a special case of a [[sparse linear model]], where the [[basis function]]s are formed by a [[kernel function]] <math>\phi</math> centred at the different [[training point]]s:<P><div id="EQ6" style="text-align:center"><math>y(x)=\displaystyle \sum_{i=1}^N w_i\phi(x-x_i)</math></div><P>          While this [[model]] is similar in form to the [[support vector machines (SVM)]], the [[kernel function]] here does not need to satisfy the [[Mercer’s condition]], which requires <math>\phi</math> to be a [[continuous symmetric kernel]] of a [[positive integral operator]].         <P>          [[Multi-kernel RVM]] is an extension of the simple [[RVM model]]. It consists of several different types of [[Kernel Function|kernel]]s <math>\phi_m</math> , given by:<P><div id="EQ7" style="text-align:center"><math>y(x)=\displaystyle \sum_{m=1}^M \sum_{i=1}^N w_{m,i}\phi_m(x-x_i)</math></div><P>        The [[sparseness property]] enables [[automatic selection]] of the proper [[Kernel Function|kernel]] at each location by [[Pruning Task|pruning]] all irrelevant [[kernel]]s, though it is possible that two different [[Kernel Function|kernel]]s remain on the same location.
 === 2005 ===
-* (Rasmussen & Quinonero-Candela, 2005) ⇒  [[Carl Edward Rasmussen]], and [[Joaquin Quinonero-Candela]] (2005, August). [http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.363.6103&rep=rep1&type=pdf "Healing the Relevance Vector Machine Through Augmentation"]. In Proceedings of the 22nd international conference on Machine learning (pp. 689-696). ACM.
+* (Rasmussen & Quinonero-Candela, 2005) ⇒ [[Carl Edward Rasmussen]], and [[Joaquin Quinonero-Candela]] (2005, August). [http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.363.6103&rep=rep1&type=pdf "Healing the Relevance Vector Machine Through Augmentation"]. In: Proceedings of the 22nd International Conference on Machine learning (pp. 689-696). ACM.
-** QUOTE: The [[Relevance Vector Machine (RVM)]] introduced by [[#2001|Tipping (2001)]] produces [[sparse solution]]s using an improper [[hierarchical prior]] and optimizing over [[hyperparameter]]s. The [[RVM]] is exactly equivalent to a [[Gaussian Process]], where the [[RVM]] [[hyperparameter]]s are [[parameter]]s of the [[GP covariance function]] (more on this in the discussion section). However, the [[covariance function]] of the [[RVM]] seen as a [[GP]] is degenerate: its [[rank]] is at most equal to the number of [[relevance vector]]s of the [[RVM]]. As a consequence, for [[localized basis function]]s, the [[RVM]] produces [[predictive distribution]]s with properties opposite to what would be desirable. Indeed, the [[RVM]] is more certain about its [[prediction]]s the further one moves away from the [[data]] it has been trained on. One would wish the opposite behaviour, as is the case with [[non-degenerate GP]]s, where the [[uncertainty]] of the [[prediction]]s is [[minimal]] for [[test point]]s in the regions of the [[input space]] where [[Training Data|(training) data]] has been seen. For [[non-localized basis function]]s, the same undesired effect persists, although the intuition may be less clear, see the discussion.
+** QUOTE: The [[Relevance Vector Machine (RVM) Algorithm|Relevance Vector Machine (RVM)]] introduced by [[#2001|Tipping (2001)]] produces [[sparse solution]]s using an improper [[hierarchical prior]] and optimizing over [[hyperparameter]]s. The [[Relevance Vector Machine (RVM) Algorithm|RVM]] is exactly equivalent to a [[Gaussian Process]], where the [[Relevance Vector Machine (RVM) Algorithm|RVM]] [[hyperparameter]]s are [[parameter]]s of the [[GP covariance function]] (more on this in the discussion section). However, the [[covariance function]] of the [[Relevance Vector Machine (RVM) Algorithm|RVM]] seen as a [[GP]] is degenerate: its [[rank]] is at most equal to the number of [[relevance vector]]s of the [[Relevance Vector Machine (RVM) Algorithm|RVM]]. As a consequence, for [[localized basis function]]s, the [[Relevance Vector Machine (RVM) Algorithm|RVM]] produces [[predictive distribution]]s with properties opposite to what would be desirable. Indeed, the [[Relevance Vector Machine (RVM) Algorithm|RVM]] is more certain about its [[prediction]]s the further one moves away from the [[data]] it has been trained on. One would wish the opposite behaviour, as is the case with [[non-degenerate GP]]s, where the [[uncertainty]] of the [[prediction]]s is [[minimal]] for [[test point]]s in the regions of the [[input space]] where [[Training Data|(training) data]] has been seen. For [[non-localized basis function]]s, the same undesired effect persists, although the intuition may be less clear, see the discussion.
 === 2004 ===
@@ Line 50: / Line 50: @@
 === 2001 ===
 * (Tipping, 2001) ⇒ [[Michael E. Tipping]] (2001). [http://www.jmlr.org/papers/volume1/tipping01a/tipping01a.pdf "Sparse Bayesian Learning and the Relevance Vector Machine"]. Journal of machine learning research, 1(Jun), 211-244.
-** QUOTE: Specifically, we adopt a fully [[probabilistic framework]] and introduce a [[prior]] over the [[model weight]]s governed by a [[set]] of [[hyperparameter]]s, one associated with each [[weight]], whose most [[probable value]]s are iteratively [[estimate]]d from the [[data]]. [[Sparsity]] is achieved because in practice we find that the [[posterior distribution]]s of many of the [[weight]]s are sharply (indeed infinitely) peaked around zero. We term those [[training vector]]s associated with the remaining non-zero [[weight]]s [[Relevance Vector|`relevance' vector]]s, in deference to the principle of [[automatic relevance determination]] which motivates the presented approach ([[MacKay, 1994]]; [[Neal, 1996]]). The most compelling feature of the [[RVM]] is that, while capable of generalisation performance comparable to an equivalent [[SVM]], it typically utilises dramatically fewer [[kernel function]]s
+** QUOTE: Specifically, we adopt a fully [[probabilistic framework]] and introduce a [[prior]] over the [[model weight]]s governed by a [[set]] of [[hyperparameter]]s, one associated with each [[weight]], whose most [[probable value]]s are iteratively [[estimate]]d from the [[data]]. [[Sparsity]] is achieved because in practice we find that the [[posterior distribution]]s of many of the [[weight]]s are sharply (indeed infinitely) peaked around zero. We term those [[training vector]]s associated with the remaining non-zero [[weight]]s [[Relevance Vector|`relevance' vector]]s, in [[deference]] to the principle of [[automatic relevance determination]] which motivates the presented approach ([[MacKay, 1994]]; [[Neal, 1996]]). The most compelling feature of the [[Relevance Vector Machine (RVM) Algorithm|RVM]] is that, while capable of generalisation performance comparable to an equivalent [[SVM]], it typically utilises dramatically fewer [[kernel function]]s
 === 2000a ===
 * ([[Bishop & Tipping, 2000]]) ⇒ [[Christopher M. Bishop]], and [[Michael E. Tipping]] (2000). [https://arxiv.org/pdf/1301.3838.pdf "Variational Relevance Vector Machines"]. Proceedings of the Sixteenth conference on Uncertainty in artificial intelligence. Morgan Kaufmann Publishers Inc., 2000.
-** QUOTE: Recently [[#2000b|Tipping &#91;8&#93;]] introduced the [[Relevance Vector Machine (RVM)]] which makes [[probabilistic prediction]]s and yet which retains the excellent [[predictive performance]] of the [[support vector machine]]. It also preserves the [[sparseness property]] of the [[SVM]]. Indeed, for a wide variety of [[test problem]]s it actually leads to [[model]]s which are dramatically [[sparser]] than the corresponding [[SVM]], while sacrificing little if anything in the [[accuracy]] of [[prediction]] (...) <P> As we have seen, the standard [[relevance vector machine]] of [[#2000b|Tipping &#91;8&#93;]] [[estimate]]s [[point value]]s for the [[hyperparameter]]s. In this paper we seek a more complete [[Bayesian Theory|Bayesian treatment]] of the [[RVM]] through exploitation of [[variational method]]s.
+** QUOTE: Recently [[#2000b|Tipping &#91;8&#93;]] introduced the [[Relevance Vector Machine (RVM) Algorithm|Relevance Vector Machine (RVM)]] which makes [[probabilistic prediction]]s and yet which retains the excellent [[predictive performance]] of the [[support vector machine]]. It also preserves the [[sparseness property]] of the [[SVM]]. Indeed, for a wide variety of [[test problem]]s it actually leads to [[model]]s which are dramatically [[sparser]] than the corresponding [[SVM]], while sacrificing little if anything in the [[accuracy]] of [[prediction]] (...)         <P>        As we have seen, the standard [[Relevance Vector Machine (RVM) Algorithm|relevance vector machine]] of [[#2000b|Tipping &#91;8&#93;]] [[estimate]]s [[point value]]s for the [[hyperparameter]]s. In this paper we seek a more complete [[Bayesian Theory|Bayesian treatment]] of the [[Relevance Vector Machine (RVM) Algorithm|RVM]] through exploitation of [[variational method]]s.
 === 2000b ===
-* (Tipping, 2000) ⇒ [[Michael E. Tipping]] (2000). [http://papers.nips.cc/paper/1719-the-relevance-vector-machine.pdf?CFID=162554868&CFTOKEN=9291dfbb06cd5bb8-D195B895-A5B2-0F41-0B325B0D2934A619 "The Relevance Vector Machine"]. In Advances in neural information processing systems (pp. 652-658).
+* (Tipping, 2000) ⇒ [[Michael E. Tipping]] (2000). [http://papers.nips.cc/paper/1719-the-relevance-vector-machine.pdf?CFID=162554868&CFTOKEN=9291dfbb06cd5bb8-D195B895-A5B2-0F41-0B325B0D2934A619 "The Relevance Vector Machine"]. In Advances in Neural Information Processing Systems (pp. 652-658).
-** QUOTE: In this paper, we introduce the [[relevance vector machine (RVM)]], a [[probabilistic sparse kernel model]] identical in [[functional form]] to the [[SVM]]. Here we adopt a [[Bayesian Learning System|Bayesian approach to learning]], where we introduce a [[prior]] over the [[weight]]s governed by a [[set]] of [[hyperparameter]]s, one associated with each [[weight]], whose most [[probable value]]s are [[iterative]]ly [[estimate]]d from the [[data]]. [[Sparsity]] is achieved because in practice we find that the [[posterior distribution]]s of many of the [[weight]]s are sharply peaked around zero. Furthermore, unlike the [[support vector classifier]], the nonzero [[weight]]s in the [[RVM]] are not associated with examples close to the [[decision boundary]], but rather appear to represent '[[prototypical]]' examples of [[class]]es. We term these [[example]]s [[Relevance Vector|'relevance' vectors]], in deference to the principle of [[automatic relevance determination (ARD)]] which motivates the [[RVM|presented approach]] <ref name="ref4">D. J. C. Mackay. Bayesian non-linear modelling for the prediction competition. In ASHRAE Transactions, vol. 100, pages 1053- 1062. ASHRAE, Atlanta, Georgia, 1994. </ref> <ref name="ref6">R. M. Neal. Bayesian Learning for Neural Networks. Springer, New York, 1996</ref>.
+** QUOTE: In this paper, we introduce the [[Relevance Vector Machine (RVM) Algorithm|relevance vector machine (RVM)]], a [[probabilistic sparse kernel model]] identical in [[functional form]] to the [[SVM]]. Here we adopt a [[Bayesian Learning System|Bayesian approach to learning]], where we introduce a [[prior]] over the [[weight]]s governed by a [[set]] of [[hyperparameter]]s, one associated with each [[weight]], whose most [[probable value]]s are [[iterative]]ly [[estimate]]d from the [[data]]. [[Sparsity]] is achieved because in practice we find that the [[posterior distribution]]s of many of the [[weight]]s are sharply peaked around zero. Furthermore, unlike the [[support vector classifier]], the nonzero [[weight]]s in the [[Relevance Vector Machine (RVM) Algorithm|RVM]] are not associated with examples close to the [[decision boundary]], but rather appear to represent '[[prototypical]]' examples of [[class]]es. We term these [[example]]s [[Relevance Vector|'relevance' vectors]], in [[deference]] to the principle of [[automatic relevance determination (ARD)]] which motivates the [[Relevance Vector Machine (RVM) Algorithm|presented approach]] <ref name="ref4">D. J. C. Mackay. Bayesian non-linear modelling for the prediction competition. In ASHRAE Transactions, vol. 100, pages 1053- 1062. ASHRAE, Atlanta, Georgia, 1994. </ref> <ref name="ref6">R. M. Neal. Bayesian Learning for Neural Networks. Springer, New York, 1996</ref>.
 <references/>
 ----
+__NOTOC__
 [[Category:Concept]]
 [[Category:Machine Learning]]
-__NOTOC__

Relevance Vector Machine (RVM) Algorithm: Difference between revisions

Latest revision as of 17:07, 1 June 2024

References

2019

2017

2010

2006

2005

2004

2001

2000a

2000b

Navigation menu

Search