Relevance Vector Machine (RVM) Algorithm: Difference between revisions

From GM-RKB
Jump to navigation Jump to search
No edit summary
m (Remove links to pages that are actually redirects to this page.)
Line 33: Line 33:
=== 2005 ===
=== 2005 ===
* (Rasmussen & Quinonero-Candela, 2005) ⇒  [[Carl Edward Rasmussen]], and [[Joaquin Quinonero-Candela]] (2005, August). [http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.363.6103&rep=rep1&type=pdf "Healing the Relevance Vector Machine Through Augmentation"]. In Proceedings of the 22nd international conference on Machine learning (pp. 689-696). ACM.
* (Rasmussen & Quinonero-Candela, 2005) ⇒  [[Carl Edward Rasmussen]], and [[Joaquin Quinonero-Candela]] (2005, August). [http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.363.6103&rep=rep1&type=pdf "Healing the Relevance Vector Machine Through Augmentation"]. In Proceedings of the 22nd international conference on Machine learning (pp. 689-696). ACM.
** QUOTE: The [[Relevance Vector Machine (RVM)]] introduced by [[#2001|Tipping (2001)]] produces [[sparse solution]]s using an improper [[hierarchical prior]] and optimizing over [[hyperparameter]]s. The [[RVM]] is exactly equivalent to a [[Gaussian Process]], where the [[RVM]] [[hyperparameter]]s are [[parameter]]s of the [[GP covariance function]] (more on this in the discussion section). However, the [[covariance function]] of the [[RVM]] seen as a [[GP]] is degenerate: its [[rank]] is at most equal to the number of [[relevance vector]]s of the [[RVM]]. As a consequence, for [[localized basis function]]s, the [[RVM]] produces [[predictive distribution]]s with properties opposite to what would be desirable. Indeed, the [[RVM]] is more certain about its [[prediction]]s the further one moves away from the [[data]] it has been trained on. One would wish the opposite behaviour, as is the case with [[non-degenerate GP]]s, where the [[uncertainty]] of the [[prediction]]s is [[minimal]] for [[test point]]s in the regions of the [[input space]] where [[Training Data|(training) data]] has been seen. For [[non-localized basis function]]s, the same undesired effect persists, although the intuition may be less clear, see the discussion.
** QUOTE: The [[Relevance Vector Machine (RVM) Algorithm|Relevance Vector Machine (RVM)]] introduced by [[#2001|Tipping (2001)]] produces [[sparse solution]]s using an improper [[hierarchical prior]] and optimizing over [[hyperparameter]]s. The [[RVM]] is exactly equivalent to a [[Gaussian Process]], where the [[RVM]] [[hyperparameter]]s are [[parameter]]s of the [[GP covariance function]] (more on this in the discussion section). However, the [[covariance function]] of the [[RVM]] seen as a [[GP]] is degenerate: its [[rank]] is at most equal to the number of [[relevance vector]]s of the [[RVM]]. As a consequence, for [[localized basis function]]s, the [[RVM]] produces [[predictive distribution]]s with properties opposite to what would be desirable. Indeed, the [[RVM]] is more certain about its [[prediction]]s the further one moves away from the [[data]] it has been trained on. One would wish the opposite behaviour, as is the case with [[non-degenerate GP]]s, where the [[uncertainty]] of the [[prediction]]s is [[minimal]] for [[test point]]s in the regions of the [[input space]] where [[Training Data|(training) data]] has been seen. For [[non-localized basis function]]s, the same undesired effect persists, although the intuition may be less clear, see the discussion.


=== 2004 ===
=== 2004 ===
Line 54: Line 54:
=== 2000a ===
=== 2000a ===
* ([[Bishop & Tipping, 2000]]) ⇒ [[Christopher M. Bishop]], and [[Michael E. Tipping]] (2000). [https://arxiv.org/pdf/1301.3838.pdf "Variational Relevance Vector Machines"]. Proceedings of the Sixteenth conference on Uncertainty in artificial intelligence. Morgan Kaufmann Publishers Inc., 2000.
* ([[Bishop & Tipping, 2000]]) ⇒ [[Christopher M. Bishop]], and [[Michael E. Tipping]] (2000). [https://arxiv.org/pdf/1301.3838.pdf "Variational Relevance Vector Machines"]. Proceedings of the Sixteenth conference on Uncertainty in artificial intelligence. Morgan Kaufmann Publishers Inc., 2000.
** QUOTE: Recently [[#2000b|Tipping &#91;8&#93;]] introduced the [[Relevance Vector Machine (RVM)]] which makes [[probabilistic prediction]]s and yet which retains the excellent [[predictive performance]] of the [[support vector machine]]. It also preserves the [[sparseness property]] of the [[SVM]]. Indeed, for a wide variety of [[test problem]]s it actually leads to [[model]]s which are dramatically [[sparser]] than the corresponding [[SVM]], while sacrificing little if anything in the [[accuracy]] of [[prediction]] (...) <P> As we have seen, the standard [[relevance vector machine]] of [[#2000b|Tipping &#91;8&#93;]] [[estimate]]s [[point value]]s for the [[hyperparameter]]s. In this paper we seek a more complete [[Bayesian Theory|Bayesian treatment]] of the [[RVM]] through exploitation of [[variational method]]s.
** QUOTE: Recently [[#2000b|Tipping &#91;8&#93;]] introduced the [[Relevance Vector Machine (RVM) Algorithm|Relevance Vector Machine (RVM)]] which makes [[probabilistic prediction]]s and yet which retains the excellent [[predictive performance]] of the [[support vector machine]]. It also preserves the [[sparseness property]] of the [[SVM]]. Indeed, for a wide variety of [[test problem]]s it actually leads to [[model]]s which are dramatically [[sparser]] than the corresponding [[SVM]], while sacrificing little if anything in the [[accuracy]] of [[prediction]] (...) <P> As we have seen, the standard [[relevance vector machine]] of [[#2000b|Tipping &#91;8&#93;]] [[estimate]]s [[point value]]s for the [[hyperparameter]]s. In this paper we seek a more complete [[Bayesian Theory|Bayesian treatment]] of the [[RVM]] through exploitation of [[variational method]]s.


=== 2000b ===
=== 2000b ===

Revision as of 20:38, 23 December 2019

A Relevance Vector Machine (RVM) Algorithm is a probabilistic supervised learning algorithm that uses Bayesian inference...



References

2019

2017

2010

2006

2005

2004

  • (Bishop, 2004) ⇒ Christopher M. Bishop. (2004). “Recent Advances in Bayesian Inference Techniques." Keynote Presentation at SIAM Conference on Data Mining.
    • Relevance Vector Machine (Tipping, 1999)
      • Bayesian alternative to support vector machine (SVM)
      • Properties
        • comparable error rates to SVM on new data
        • no cross-validation to set complexity parameters
        • applicable to wide choice of basis function
        • multi-class classification
        • probabilistic outputs
        • dramatically fewer kernels (by an order of magnitude)
        • but, slower to train than SVM

2001

2000a

2000b

  1. D. J. C. Mackay. Bayesian non-linear modelling for the prediction competition. In ASHRAE Transactions, vol. 100, pages 1053- 1062. ASHRAE, Atlanta, Georgia, 1994.
  2. R. M. Neal. Bayesian Learning for Neural Networks. Springer, New York, 1996