Relevance Vector Machine (RVM) Algorithm: Difference between revisions

From GM-RKB
Jump to navigation Jump to search
No edit summary
m (Text replacement - " [[" to " [[")
 
(23 intermediate revisions by 3 users not shown)
Line 1: Line 1:
A [[Relevance Vector Machine (RVM) Algorithm]] is a [[probabilistic model|probabilistic]] [[supervised learning algorithm]] that uses [[Bayesian inference]]...  
A [[Relevance Vector Machine (RVM) Algorithm]] is a [[probabilistic model|probabilistic]] [[supervised learning algorithm]] that uses [[Bayesian inference]] ...
* <B>Context:</B>
* <B>Context:</B>
** It can range from being a [[Relevance Vector Machine Regression Algorithm]] to being a [[Relevance Vector Machine Classification Algorithm]].
** It can range from being a [[Relevance Vector Machine Regression Algorithm]] to being a [[Relevance Vector Machine Classification Algorithm]].
Line 6: Line 6:
** a [[Generalized Relevance Vector Machine]] ([[#2017|Jia et al.,2017]]),
** a [[Generalized Relevance Vector Machine]] ([[#2017|Jia et al.,2017]]),
** a [[Variational Relevance Vector Machine]] ([[#200a|Bishop & Tipping, 2000]]),
** a [[Variational Relevance Vector Machine]] ([[#200a|Bishop & Tipping, 2000]]),
** a [[Multi-Kernel Relevance Vector Machine]] (e.g. [[#2006|Tzikas et al., 2006]]).
** a [[Multi-Kernel Relevance Vector Machine]] (e.g. [[#2006|Tzikas et al., 2006]]),
** …
* <B>Counter-Example(s):</B>  
* <B>Counter-Example(s):</B>  
** a [[Support Vector Machine]].
** a [[Support Vector Machine]].
* <B>See:</B> [[Bayesian Analysis]], [[Automatic Relevance Determination]], [[Sparse Bayesian Learning System]], [[Sparse Bayesian Regression]], [[Machine Learning]], [[Bayesian Inference]], [[Occam's Razor]], [[Regression Analysis]], [[Probabilistic Classification]], [[Journal of Machine Learning Research]], [[Support Vector Machine]], [[Gaussian Process]], [[Covariance Function]], [[Kernel Function]],  
* <B>See:</B> [[Bayesian Analysis]], [[Automatic Relevance Determination]], [[Sparse Bayesian Learning System]], [[Sparse Bayesian Regression]], [[Machine Learning]], [[Bayesian Inference]], [[Occam's Razor]], [[Regression Analysis]], [[Probabilistic Classification]], [[Journal of Machine Learning Research]], [[Support Vector Machine]], [[Gaussian Process]], [[Covariance Function]], [[Kernel Function]],----
----
----
----


Line 17: Line 17:
=== 2019 ===
=== 2019 ===
* (Wikipedia, 2019) ⇒ https://en.wikipedia.org/wiki/Relevance_vector_machine Retrieved:2019-10-4.
* (Wikipedia, 2019) ⇒ https://en.wikipedia.org/wiki/Relevance_vector_machine Retrieved:2019-10-4.
** In [[mathematics]], a '''Relevance Vector Machine (RVM)''' is a [[machine learning]] technique that uses [[Bayesian inference]] to obtain [[Occam's razor|parsimonious]] solutions for [[Regression analysis|regression]] and [[probabilistic classification]].  The RVM has an identical functional form to the [[support vector machine]], but provides probabilistic classification. It is actually equivalent to a [[Gaussian process]] model with [[covariance function]]: <P><math> k(\mathbf{x},\mathbf{x'}) = \sum_{j=1}^N \frac{1}{\alpha_j} \varphi(\mathbf{x},\mathbf{x}_j)\varphi(\mathbf{x}',\mathbf{x}_j) </math> <P>where <math> \varphi </math> is the [[kernel function]] (usually Gaussian), <math> \alpha_j </math> are the variances of the prior on the weight vector <math> w \sim N(0,\alpha^{-1}I) </math> , and <math> \mathbf{x}_1,\ldots,\mathbf{x}_N </math> are the input vectors of the [[training set]].  Compared to that of [[support vector machine]]s (SVM), the Bayesian formulation of the RVM avoids the set of free parameters of the SVM (that usually require cross-validation-based post-optimizations). However RVMs use an [[expectation maximization]] (EM)-like learning method and are therefore at risk of local minima. This is unlike the standard [[sequential minimal optimization]] (SMO)-based algorithms employed by [[Support vector machine|SVM]]s, which are guaranteed to find a global optimum (of the convex problem). The relevance vector machine is [[Software patents under United States patent law|patented in the United States]] by [[Microsoft]].
** In [[mathematics]], a '''Relevance Vector Machine (RVM)''' is a [[machine learning]] technique that uses [[Bayesian inference]] to obtain [[Occam's razor|parsimonious]] solutions for [[Regression analysis|regression]] and [[probabilistic classification]].  The RVM has an identical functional form to the [[support vector machine]], but provides probabilistic classification. It is actually equivalent to a [[Gaussian process]] model with [[covariance function]]:         <P><math> k(\mathbf{x},\mathbf{x'}) = \sum_{j=1}^N \frac{1}{\alpha_j} \varphi(\mathbf{x},\mathbf{x}_j)\varphi(\mathbf{x}',\mathbf{x}_j) </math>         <P>         where <math> \varphi </math> is the [[kernel function]] (usually Gaussian), <math> \alpha_j </math> are the variances of the prior on the weight vector <math> w \sim N(0,\alpha^{-1}I) </math> , and <math> \mathbf{x}_1,\ldots,\mathbf{x}_N </math> are the input vectors of the [[training set]].  Compared to that of [[support vector machine]]s (SVM), the Bayesian formulation of the RVM avoids the set of free parameters of the SVM (that usually require cross-validation-based post-optimizations). However RVMs use an [[expectation maximization]] (EM)-like learning method and are therefore at risk of local minima. This is unlike the standard [[sequential minimal optimization]] (SMO)-based algorithms employed by [[Support vector machine|SVM]]s, which are guaranteed to find a global optimum (of the convex problem). The relevance vector machine is [[Software patents under United States patent law|patented in the United States]] by [[Microsoft]].


=== 2017 ===
=== 2017 ===
* (Jia et al.,2017) ⇒ [[Yuheng Jia]] , [[Sam Kwong]], [[Wenhui Wu]], [[Wei Gao]], and [[Ran Wang]] (2017, September). [https://ieeexplore.ieee.org/document/8324361 "Generalized Relevance Vector Machine"]. In 2017 Intelligent Systems Conference (IntelliSys) (pp. 638-645). IEEE.
* (Jia et al.,2017) ⇒ [[Yuheng Jia]] , [[Sam Kwong]], [[Wenhui Wu]], [[Wei Gao]], and [[Ran Wang]] (2017, September). [https://ieeexplore.ieee.org/document/8324361 "Generalized Relevance Vector Machine"]. In 2017 Intelligent Systems Conference (IntelliSys) (pp. 638-645). IEEE.
** QUOTE: This paper considers the generalized version of [[relevance vector machine (RVM)]], which is a [[sparse Bayesian kernel machine]] for [[classification]] and ordinary [[regression]]. [[Generalized RVM (GRVM)]] follows the work of [[generalized linear model (GLM)]], which is a natural generalization of ordinary [[linear regression model]] and shares a common approach to [[estimate]] the [[parameter]]s. [[GRVM]] inherits the advantages of [[GLM]], i.e., [[unified model structure]], same [[training algorithm]], and convenient [[task-specific model design]]. It also inherits the advantages of [[RVM]], i.e., [[probabilistic output]], extremely [[sparse solution]], [[hyperparameter auto-estimation]]. Besides, [[GRVM]] extends [[RVM]] to a wider range of [[learning task]]s beyond [[classification]] and ordinary [[regression]] by assuming that the [[conditional output]] belongs to [[exponential family distribution (EFD)]]. Since [[EFD]] results in [[inference]] [[intractable problem]] in [[Bayesian analysis]], in this paper, [[Laplace approximation]] is adopted to solve this problem, which is a common approach in [[Bayesian inference]]. Further, several [[task-specific model]]s are designed based on [[GRVM]] including [[model]]s for ordinary [[regression]], [[count data regression]], [[classification]], [[ordinal regression]], etc. Besides, the relationship between [[GRVM]] and traditional [[RVM]] models are discussed (...)
** QUOTE: This paper considers the generalized version of [[Relevance Vector Machine (RVM) Algorithm|relevance vector machine (RVM)]], which is a [[sparse Bayesian kernel machine]] for [[classification]] and ordinary [[regression]]. [[Generalized RVM (GRVM)]] follows the work of [[generalized linear model (GLM)]], which is a natural generalization of ordinary [[linear regression model]] and shares a common approach to [[estimate]] the [[parameter]]s. [[GRVM]] inherits the advantages of [[GLM]], i.e., [[unified model structure]], same [[training algorithm]], and convenient [[task-specific model design]]. It also inherits the advantages of [[Relevance Vector Machine (RVM) Algorithm|RVM]], i.e., [[probabilistic output]], extremely [[sparse solution]], [[hyperparameter auto-estimation]]. Besides, [[GRVM]] extends [[Relevance Vector Machine (RVM) Algorithm|RVM]] to a wider range of [[learning task]]s beyond [[classification]] and ordinary [[regression]] by assuming that the [[conditional output]] belongs to [[exponential family distribution (EFD)]]. Since [[EFD]] results in [[inference]] [[intractable problem]] in [[Bayesian analysis]], in this paper, [[Laplace approximation]] is adopted to solve this problem, which is a common approach in [[Bayesian inference]]. Further, several [[task-specific model]]s are designed based on [[GRVM]] including [[model]]s for ordinary [[regression]], [[count data regression]], [[classification]], [[ordinal regression]], etc. Besides, the relationship between [[GRVM]] and traditional [[Relevance Vector Machine (RVM) Algorithm|RVM]] models are discussed (...)


=== 2010 ===
=== 2010 ===
* ([[Saarela et al., 2010]]) ⇒ [[Matti Saarela]], [[Tapio Elomaa]], and [[Keijo Ruohonen]] (2010). [https://link.springer.com/chapter/10.1007%2F978-3-642-05177-7_11 "An analysis of relevance vector machine regression"]. In Advances in Machine Learning I (pp. 227-246). Springer, Berlin, Heidelberg. [https://doi.org/10.1007/978-3-642-05177-7_11 DOI: 10.1007/978-3-642-05177-7_11]. ISBN: 978-3-642-05177-7.
* ([[Saarela et al., 2010]]) ⇒ [[Matti Saarela]], [[Tapio Elomaa]], and [[Keijo Ruohonen]] (2010). [https://link.springer.com/chapter/10.1007%2F978-3-642-05177-7_11 "An analysis of relevance vector machine regression"]. In Advances in Machine Learning I (pp. 227-246). Springer, Berlin, Heidelberg. [https://doi.org/10.1007/978-3-642-05177-7_11 DOI: 10.1007/978-3-642-05177-7_11]. ISBN: 978-3-642-05177-7.
** QUOTE: The [[relevance vector machine (RVM)]] is a [[Bayesian framework]] for [[learning sparse regression model]]s and [[classifier]]s. Despite of its popularity and practical success, no thorough [[analysis]] of its functionality exists. In this paper we consider the [[RVM]] in the case of [[regression model]]s and present two kinds of [[analysis]] results: we derive a full characterization of the behavior of the [[RVM]] analytically when the columns of the [[regression matrix]] are orthogonal and give some results concerning scale and [[rotation invariance]] of the [[RVM]]. We also consider the practical implications of our results and present a scenario in which our results can be used to detect potential weakness in the [[RVM]] framework.
** QUOTE: The [[Relevance Vector Machine (RVM) Algorithm|relevance vector machine (RVM)]] is a [[Bayesian framework]] for [[learning sparse regression model]]s and [[classifier]]s. Despite of its popularity and practical success, no thorough [[analysis]] of its functionality exists. In this paper we consider the [[Relevance Vector Machine (RVM) Algorithm|RVM]] in the case of [[regression model]]s and present two kinds of [[analysis]] results: we derive a full characterization of the behavior of the [[Relevance Vector Machine (RVM) Algorithm|RVM]] analytically when the columns of the [[regression matrix]] are orthogonal and give some results concerning scale and [[rotation invariance]] of the [[Relevance Vector Machine (RVM) Algorithm|RVM]]. We also consider the practical implications of our results and present a scenario in which our results can be used to detect potential weakness in the [[Relevance Vector Machine (RVM) Algorithm|RVM]] framework.


=== 2006 ===
=== 2006 ===
* ([[Tzikas et al., 2006]]) ⇒ [[Dimitris Tzikas]], [[Liyang Wei]], [[Aristidis Likas]], [[Yongyi Yang]], and [[Nikolas P. Galatsanos]] (2006). [https://pdfs.semanticscholar.org/0a97/b301151cc6fd75c3a5ef6d1f0838e8714f5c.pdf?_ga=2.213211476.1958125896.1570161638-1669716821.1555811252 "A Tutorial On Relevance Vector Machines For Regression And Classification With Applications"].
* ([[Tzikas et al., 2006]]) ⇒ [[Dimitris Tzikas]], [[Liyang Wei]], [[Aristidis Likas]], [[Yongyi Yang]], and [[Nikolas P. Galatsanos]] (2006). [https://pdfs.semanticscholar.org/0a97/b301151cc6fd75c3a5ef6d1f0838e8714f5c.pdf?_ga=2.213211476.1958125896.1570161638-1669716821.1555811252 "A Tutorial On Relevance Vector Machines For Regression And Classification With Applications"].
** QUOTE: [[Relevance vector machines (RVM)]] have recently attracted much interest in the [[research community]] because they provide a number of advantages. They are based on a [[Bayesian formulation]] of a [[linear model]] with an appropriate [[prior]] that results in a [[sparse representation]]. As a consequence, they can generalize well and provide inferences at low [[computational cost]]. In this [[tutorial]] we first present the basic theory of [[RVM Regression System|RVM for regression]] and [[RVM Classification System|classification]], followed by two examples illustrating the application of [[RVM]] for [[object detection]] and [[classification]] (...)<P>[[Relevance vector machine (RVM)]] is a special case of a [[sparse linear model]], where the [[basis function]]s are formed by a [[kernel function]] <math>\phi</math> centred at the different [[training point]]s:<P><div id="EQ6" style="text-align:center"><math>y(x)=\displaystyle \sum_{i=1}^N w_i\phi(x-x_i)</math></div><P>While this [[model]] is similar in form to the [[support vector machines (SVM)]], the [[kernel function]] here does not need to satisfy the [[Mercer’s condition]], which requires <math>\phi</math> to be a [[continuous symmetric kernel]] of a [[positive integral operator]]. <P>[[Multi-kernel RVM]] is an extension of the simple [[RVM model]]. It consists of several different types of [[Kernel Function|kernel]]s <math>\phi_m</math> , given by:<P><div id="EQ7" style="text-align:center"><math>y(x)=\displaystyle \sum_{m=1}^M \sum_{i=1}^N w_{m,i}\phi_m(x-x_i)</math></div><P> The [[sparseness property]] enables [[automatic selection]] of the proper [[Kernel Function|kernel]] at each location by [[Pruning Task|pruning]] all irrelevant [[kernel]]s, though it is possible that two different [[Kernel Function|kernel]]s remain on the same location.
** QUOTE: [[Relevance Vector Machine (RVM) Algorithm|Relevance vector machines (RVM)]] have recently attracted much interest in the [[research community]] because they provide a number of advantages. They are based on a [[Bayesian formulation]] of a [[linear model]] with an appropriate [[prior]] that results in a [[sparse representation]]. As a consequence, they can generalize well and provide inferences at low [[computational cost]]. In this [[tutorial]] we first present the basic theory of [[RVM Regression System|RVM for regression]] and [[RVM Classification System|classification]], followed by two examples illustrating the application of [[Relevance Vector Machine (RVM) Algorithm|RVM]] for [[object detection]] and [[classification]] (...)         <P>         [[Relevance Vector Machine (RVM) Algorithm|Relevance vector machine (RVM)]] is a special case of a [[sparse linear model]], where the [[basis function]]s are formed by a [[kernel function]] <math>\phi</math> centred at the different [[training point]]s:<P><div id="EQ6" style="text-align:center"><math>y(x)=\displaystyle \sum_{i=1}^N w_i\phi(x-x_i)</math></div><P>         While this [[model]] is similar in form to the [[support vector machines (SVM)]], the [[kernel function]] here does not need to satisfy the [[Mercer’s condition]], which requires <math>\phi</math> to be a [[continuous symmetric kernel]] of a [[positive integral operator]].         <P>         [[Multi-kernel RVM]] is an extension of the simple [[RVM model]]. It consists of several different types of [[Kernel Function|kernel]]s <math>\phi_m</math> , given by:<P><div id="EQ7" style="text-align:center"><math>y(x)=\displaystyle \sum_{m=1}^M \sum_{i=1}^N w_{m,i}\phi_m(x-x_i)</math></div><P>       The [[sparseness property]] enables [[automatic selection]] of the proper [[Kernel Function|kernel]] at each location by [[Pruning Task|pruning]] all irrelevant [[kernel]]s, though it is possible that two different [[Kernel Function|kernel]]s remain on the same location.


=== 2005 ===
=== 2005 ===
* (Rasmussen & Quinonero-Candela, 2005) ⇒ [[Carl Edward Rasmussen]], and [[Joaquin Quinonero-Candela]] (2005, August). [http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.363.6103&rep=rep1&type=pdf "Healing the Relevance Vector Machine Through Augmentation"]. In Proceedings of the 22nd international conference on Machine learning (pp. 689-696). ACM.
* (Rasmussen & Quinonero-Candela, 2005) ⇒ [[Carl Edward Rasmussen]], and [[Joaquin Quinonero-Candela]] (2005, August). [http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.363.6103&rep=rep1&type=pdf "Healing the Relevance Vector Machine Through Augmentation"]. In: Proceedings of the 22nd International Conference on Machine learning (pp. 689-696). ACM.
** QUOTE: The [[Relevance Vector Machine (RVM)]] introduced by [[#2001|Tipping (2001)]] produces [[sparse solution]]s using an improper [[hierarchical prior]] and optimizing over [[hyperparameter]]s. The [[RVM]] is exactly equivalent to a [[Gaussian Process]], where the [[RVM]] [[hyperparameter]]s are [[parameter]]s of the [[GP covariance function]] (more on this in the discussion section). However, the [[covariance function]] of the [[RVM]] seen as a [[GP]] is degenerate: its [[rank]] is at most equal to the number of [[relevance vector]]s of the [[RVM]]. As a consequence, for [[localized basis function]]s, the [[RVM]] produces [[predictive distribution]]s with properties opposite to what would be desirable. Indeed, the [[RVM]] is more certain about its [[prediction]]s the further one moves away from the [[data]] it has been trained on. One would wish the opposite behaviour, as is the case with [[non-degenerate GP]]s, where the [[uncertainty]] of the [[prediction]]s is [[minimal]] for [[test point]]s in the regions of the [[input space]] where [[Training Data|(training) data]] has been seen. For [[non-localized basis function]]s, the same undesired effect persists, although the intuition may be less clear, see the discussion.
** QUOTE: The [[Relevance Vector Machine (RVM) Algorithm|Relevance Vector Machine (RVM)]] introduced by [[#2001|Tipping (2001)]] produces [[sparse solution]]s using an improper [[hierarchical prior]] and optimizing over [[hyperparameter]]s. The [[Relevance Vector Machine (RVM) Algorithm|RVM]] is exactly equivalent to a [[Gaussian Process]], where the [[Relevance Vector Machine (RVM) Algorithm|RVM]] [[hyperparameter]]s are [[parameter]]s of the [[GP covariance function]] (more on this in the discussion section). However, the [[covariance function]] of the [[Relevance Vector Machine (RVM) Algorithm|RVM]] seen as a [[GP]] is degenerate: its [[rank]] is at most equal to the number of [[relevance vector]]s of the [[Relevance Vector Machine (RVM) Algorithm|RVM]]. As a consequence, for [[localized basis function]]s, the [[Relevance Vector Machine (RVM) Algorithm|RVM]] produces [[predictive distribution]]s with properties opposite to what would be desirable. Indeed, the [[Relevance Vector Machine (RVM) Algorithm|RVM]] is more certain about its [[prediction]]s the further one moves away from the [[data]] it has been trained on. One would wish the opposite behaviour, as is the case with [[non-degenerate GP]]s, where the [[uncertainty]] of the [[prediction]]s is [[minimal]] for [[test point]]s in the regions of the [[input space]] where [[Training Data|(training) data]] has been seen. For [[non-localized basis function]]s, the same undesired effect persists, although the intuition may be less clear, see the discussion.


=== 2004 ===
=== 2004 ===
Line 50: Line 50:
=== 2001 ===
=== 2001 ===
* (Tipping, 2001) ⇒ [[Michael E. Tipping]] (2001). [http://www.jmlr.org/papers/volume1/tipping01a/tipping01a.pdf "Sparse Bayesian Learning and the Relevance Vector Machine"]. Journal of machine learning research, 1(Jun), 211-244.
* (Tipping, 2001) ⇒ [[Michael E. Tipping]] (2001). [http://www.jmlr.org/papers/volume1/tipping01a/tipping01a.pdf "Sparse Bayesian Learning and the Relevance Vector Machine"]. Journal of machine learning research, 1(Jun), 211-244.
** QUOTE: Specifically, we adopt a fully [[probabilistic framework]] and introduce a [[prior]] over the [[model weight]]s governed by a [[set]] of [[hyperparameter]]s, one associated with each [[weight]], whose most [[probable value]]s are iteratively [[estimate]]d from the [[data]]. [[Sparsity]] is achieved because in practice we find that the [[posterior distribution]]s of many of the [[weight]]s are sharply (indeed infinitely) peaked around zero. We term those [[training vector]]s associated with the remaining non-zero [[weight]]s [[Relevance Vector|`relevance' vector]]s, in deference to the principle of [[automatic relevance determination]] which motivates the presented approach ([[MacKay, 1994]]; [[Neal, 1996]]). The most compelling feature of the [[RVM]] is that, while capable of generalisation performance comparable to an equivalent [[SVM]], it typically utilises dramatically fewer [[kernel function]]s
** QUOTE: Specifically, we adopt a fully [[probabilistic framework]] and introduce a [[prior]] over the [[model weight]]s governed by a [[set]] of [[hyperparameter]]s, one associated with each [[weight]], whose most [[probable value]]s are iteratively [[estimate]]d from the [[data]]. [[Sparsity]] is achieved because in practice we find that the [[posterior distribution]]s of many of the [[weight]]s are sharply (indeed infinitely) peaked around zero. We term those [[training vector]]s associated with the remaining non-zero [[weight]]s [[Relevance Vector|`relevance' vector]]s, in [[deference]] to the principle of [[automatic relevance determination]] which motivates the presented approach ([[MacKay, 1994]]; [[Neal, 1996]]). The most compelling feature of the [[Relevance Vector Machine (RVM) Algorithm|RVM]] is that, while capable of generalisation performance comparable to an equivalent [[SVM]], it typically utilises dramatically fewer [[kernel function]]s


=== 2000a ===
=== 2000a ===
* ([[Bishop & Tipping, 2000]]) ⇒ [[Christopher M. Bishop]], and [[Michael E. Tipping]] (2000). [https://arxiv.org/pdf/1301.3838.pdf "Variational Relevance Vector Machines"]. Proceedings of the Sixteenth conference on Uncertainty in artificial intelligence. Morgan Kaufmann Publishers Inc., 2000.
* ([[Bishop & Tipping, 2000]]) ⇒ [[Christopher M. Bishop]], and [[Michael E. Tipping]] (2000). [https://arxiv.org/pdf/1301.3838.pdf "Variational Relevance Vector Machines"]. Proceedings of the Sixteenth conference on Uncertainty in artificial intelligence. Morgan Kaufmann Publishers Inc., 2000.
** QUOTE: Recently [[#2000b|Tipping &#91;8&#93;]] introduced the [[Relevance Vector Machine (RVM)]] which makes [[probabilistic prediction]]s and yet which retains the excellent [[predictive performance]] of the [[support vector machine]]. It also preserves the [[sparseness property]] of the [[SVM]]. Indeed, for a wide variety of [[test problem]]s it actually leads to [[model]]s which are dramatically [[sparser]] than the corresponding [[SVM]], while sacrificing little if anything in the [[accuracy]] of [[prediction]] (...) <P> As we have seen, the standard [[relevance vector machine]] of [[#2000b|Tipping &#91;8&#93;]] [[estimate]]s [[point value]]s for the [[hyperparameter]]s. In this paper we seek a more complete [[Bayesian Theory|Bayesian treatment]] of the [[RVM]] through exploitation of [[variational method]]s.
** QUOTE: Recently [[#2000b|Tipping &#91;8&#93;]] introduced the [[Relevance Vector Machine (RVM) Algorithm|Relevance Vector Machine (RVM)]] which makes [[probabilistic prediction]]s and yet which retains the excellent [[predictive performance]] of the [[support vector machine]]. It also preserves the [[sparseness property]] of the [[SVM]]. Indeed, for a wide variety of [[test problem]]s it actually leads to [[model]]s which are dramatically [[sparser]] than the corresponding [[SVM]], while sacrificing little if anything in the [[accuracy]] of [[prediction]] (...)         <P>       As we have seen, the standard [[Relevance Vector Machine (RVM) Algorithm|relevance vector machine]] of [[#2000b|Tipping &#91;8&#93;]] [[estimate]]s [[point value]]s for the [[hyperparameter]]s. In this paper we seek a more complete [[Bayesian Theory|Bayesian treatment]] of the [[Relevance Vector Machine (RVM) Algorithm|RVM]] through exploitation of [[variational method]]s.


=== 2000b ===
=== 2000b ===
* (Tipping, 2000) ⇒ [[Michael E. Tipping]] (2000). [http://papers.nips.cc/paper/1719-the-relevance-vector-machine.pdf?CFID=162554868&CFTOKEN=9291dfbb06cd5bb8-D195B895-A5B2-0F41-0B325B0D2934A619 "The Relevance Vector Machine"]. In Advances in neural information processing systems (pp. 652-658).
* (Tipping, 2000) ⇒ [[Michael E. Tipping]] (2000). [http://papers.nips.cc/paper/1719-the-relevance-vector-machine.pdf?CFID=162554868&CFTOKEN=9291dfbb06cd5bb8-D195B895-A5B2-0F41-0B325B0D2934A619 "The Relevance Vector Machine"]. In Advances in Neural Information Processing Systems (pp. 652-658).
** QUOTE: In this paper, we introduce the [[relevance vector machine (RVM)]], a [[probabilistic sparse kernel model]] identical in [[functional form]] to the [[SVM]]. Here we adopt a [[Bayesian Learning System|Bayesian approach to learning]], where we introduce a [[prior]] over the [[weight]]s governed by a [[set]] of [[hyperparameter]]s, one associated with each [[weight]], whose most [[probable value]]s are [[iterative]]ly [[estimate]]d from the [[data]]. [[Sparsity]] is achieved because in practice we find that the [[posterior distribution]]s of many of the [[weight]]s are sharply peaked around zero. Furthermore, unlike the [[support vector classifier]], the nonzero [[weight]]s in the [[RVM]] are not associated with examples close to the [[decision boundary]], but rather appear to represent '[[prototypical]]' examples of [[class]]es. We term these [[example]]s [[Relevance Vector|'relevance' vectors]], in deference to the principle of [[automatic relevance determination (ARD)]] which motivates the [[RVM|presented approach]] <ref name="ref4">D. J. C. Mackay. Bayesian non-linear modelling for the prediction competition. In ASHRAE Transactions, vol. 100, pages 1053- 1062. ASHRAE, Atlanta, Georgia, 1994. </ref> <ref name="ref6">R. M. Neal. Bayesian Learning for Neural Networks. Springer, New York, 1996</ref>.
** QUOTE: In this paper, we introduce the [[Relevance Vector Machine (RVM) Algorithm|relevance vector machine (RVM)]], a [[probabilistic sparse kernel model]] identical in [[functional form]] to the [[SVM]]. Here we adopt a [[Bayesian Learning System|Bayesian approach to learning]], where we introduce a [[prior]] over the [[weight]]s governed by a [[set]] of [[hyperparameter]]s, one associated with each [[weight]], whose most [[probable value]]s are [[iterative]]ly [[estimate]]d from the [[data]]. [[Sparsity]] is achieved because in practice we find that the [[posterior distribution]]s of many of the [[weight]]s are sharply peaked around zero. Furthermore, unlike the [[support vector classifier]], the nonzero [[weight]]s in the [[Relevance Vector Machine (RVM) Algorithm|RVM]] are not associated with examples close to the [[decision boundary]], but rather appear to represent '[[prototypical]]' examples of [[class]]es. We term these [[example]]s [[Relevance Vector|'relevance' vectors]], in [[deference]] to the principle of [[automatic relevance determination (ARD)]] which motivates the [[Relevance Vector Machine (RVM) Algorithm|presented approach]] <ref name="ref4">D. J. C. Mackay. Bayesian non-linear modelling for the prediction competition. In ASHRAE Transactions, vol. 100, pages 1053- 1062. ASHRAE, Atlanta, Georgia, 1994. </ref> <ref name="ref6">R. M. Neal. Bayesian Learning for Neural Networks. Springer, New York, 1996</ref>.
<references/>
<references/>
----
----
__NOTOC__
[[Category:Concept]]
[[Category:Concept]]
[[Category:Machine Learning]]
[[Category:Machine Learning]]
__NOTOC__

Latest revision as of 17:07, 1 June 2024

A Relevance Vector Machine (RVM) Algorithm is a probabilistic supervised learning algorithm that uses Bayesian inference ...


References

2019

2017

2010

2006

2005

2004

  • (Bishop, 2004) ⇒ Christopher M. Bishop. (2004). “Recent Advances in Bayesian Inference Techniques." Keynote Presentation at SIAM Conference on Data Mining.
    • Relevance Vector Machine (Tipping, 1999)
      • Bayesian alternative to support vector machine (SVM)
      • Properties
        • comparable error rates to SVM on new data
        • no cross-validation to set complexity parameters
        • applicable to wide choice of basis function
        • multi-class classification
        • probabilistic outputs
        • dramatically fewer kernels (by an order of magnitude)
        • but, slower to train than SVM

2001

2000a

2000b

  1. D. J. C. Mackay. Bayesian non-linear modelling for the prediction competition. In ASHRAE Transactions, vol. 100, pages 1053- 1062. ASHRAE, Atlanta, Georgia, 1994.
  2. R. M. Neal. Bayesian Learning for Neural Networks. Springer, New York, 1996