2010 RegularizedMarginbasedCondition
- (Jin et al., 2010) ⇒ Xiao-Bo Jin, Cheng-Lin Liu, and Xinwen Hou. (2010). “Regularized Margin-based Conditional Log-likelihood Loss for Prototype Learning.” In: Pattern Recognition Journal, 43(7).In: Pattern Recognition Journal, 43(7). doi:10.1016/j.patcog.2010.01.013
Subject Headings: Regularization Term
Notes
- It proposes a Prototype Learning Algorithm that minimizes a conditional log-likelihood loss (CLL) called log-likelihood of margin (LOGM).
- It proposes that addition of a Regularization Term to the loss function for avoiding over-fitting as well as to maximize the hypothesis margin.
Cited By
- http://scholar.google.com/scholar?q=%222010%22+Regularized%20Margin-based%20Conditional%20Log-likelihood%20Loss%20for%20Prototype%20Learning
- http://dl.acm.org/citation.cfm?id=1752253.1752392&preflayout=flat#citedby
Quotes
Abstract
The classification performance of nearest prototype classifiers largely relies on the prototype learning algorithm. The minimum classification error (MCE) method and the soft nearest prototype classifier (SNPC) method are two important algorithms using misclassification loss. This paper proposes a new prototype learning algorithm based on the conditional log-likelihood loss (CLL), which is based on the discriminative model called log-likelihood of margin (LOGM). A regularization term is added to avoid over-fitting in training as well as to maximize the hypothesis margin. The CLL in the LOGM algorithm is a convex function of margin, and so, shows better convergence than the MCE. In addition, we show the effects of distance metric learning with both prototype-dependent weighting and prototype-independent weighting. Our empirical study on the benchmark datasets demonstrates that the LOGM algorithm yields higher classification accuracies than the MCE, generalized learning vector quantization (GLVQ), soft nearest prototype classifier (SNPC) and the robust soft learning vector quantization (RSLVQ), and moreover, the LOGM with prototype-dependent weighting achieves comparable accuracies to the support vector machine (SVM) classifier.
1. Introduction
Nearest neighbor classification [1] is a simple and appealing non-parametric method for pattern classification. It does not need any processing of the training data, but involves heavy burden of storage space and computation in classification. Prototype learning is to represent the training data with a set of points in feature space, called prototypes. A test point x is then classified to the class of the closest prototype. To reduce the number of prototypes, many prototype selection and prototype learning methods have been proposed [2–7]. By selecting or synthesizing prototypes that better represent the class distributions or decision boundaries, these methods are also effective to improve the classification accuracy. The nearest neighbor classifiers with reduced prototypes are also called nearest prototype classifiers. They have been widely used in applications such as character recognition [7], text categorization [8], classification of mass spectrometry data [9], and so on.
Learning vector quantization (LVQ) [10] is a well known prototype learning algorithm which offers intuitive and simple, yet powerful learning capacity in supervised learning. Kohonen proposed a number of improved versions of LVQ such as LVQ2, LVQ2.1, and LVQ3 [3]. Crammer et al. [11] show that LVQ falls in a family of maximal margin learning algorithms providing a rigorous upper bound of generalization error. Although the LVQ algorithm yields superior classification performance, it does not guarantee convergence in training [4]. The performance of LVQ depends on several factors: the initialization of prototypes, the distance metric, and the selection of informative patterns, etc.
To improve the training and generalization performance of LVQ, many extensions or variants have been proposed. In [12], an initialization insensitive LVQ algorithm uses a harmonic average distance instead of the usual nearest-neighbor distance. Sato and Yamada proposed a generalized LVQ (GLVQ) algorithm [4], where prototypes are updated based on a continuous and differentiable loss function. The generalized relevance LVQ (GRLVQ) algorithm [13] introduces the adaptive weighted metric to extend the GLVQ, by adding a prototype-independent weight to each input dimen- sion indicating its relevance. Similarly, the algorithm of Paredes and Vidal [14] learns prototypes and distance metric simulta- neously using a misclassification loss function. In [15], the combination of pattern selection and weighted distance norm is explored. It selects an update set composed of points that are considered to be at the risk of being captured by a prototype of a different class.
Many prototype learning algorithms based on loss minimiza- tion have been shown to give higher classification accuracies than LVQ [7]. These algorithms include the minimum classification error (MCE) method [16], the generalized LVQ (GLVQ) [4], the maximum class probability (MAXP) method [7], the soft nearest prototype classifier (SNPC) [17] and the robust soft learning vector quantization (RSLVQ) [18], etc. The MCE and GLVQ methods minimize margin-based loss functions, while the MAXP and SNPC formulate multiple prototypes in a Gaussian mixture framework and aim to minimize the Bayesian classification error on training data. Similarly, the RSLVQ optimizes the conditional log-likelihood loss (CLL) within a Gaussian mixture framework. The misclassification loss of [14] is similar to those of MCE and GLVQ. The margin-based algorithms, including LVQ, MCE and GLVQ, adjust only two prototypes on a training sample. The training complexity of the MAXP, SNPC and the RSLVQ method can be similarly decreased by pruning prototype updating according to the proximity of prototypes to the input pattern or using the windowing rule like the LVQ2.1.
In this paper, we investigate into the loss functions of prototype learning algorithms and aim to improve the classifica- tion performance using a new form of loss function. The misclassification loss functions of MCE, GLVQ and the one in [14], based on the nearest prototype from the genuine class (correct class) of input pattern and the one from incorrect classes, are smoothed functions of the traditional 0–1 loss, while the direct minimization of the 0–1 loss is computationally intractable [19]. The MAXP algorithm [7], the SNPC method [17] and the RSLVQ [18] approximate the misclassification rate in the frame- work of Gaussian mixture. We will discuss the relationships between these algorithms under certain conditions.
We propose a new prototype learning algorithm by minimizing a conditional log-likelihood loss (CLL), called log-likelihood of margin (LOGM). The convexity of margin-based loss in LOGM ensures that there exists a unique maximum margin. We also add a regularization term to the loss function for avoiding [[over-fitting in training]] as well as maximizing the hypothesis margin. The proposed method can be easily extended to a weighted distance metric instead of the Euclidian distance to incorporate the relevance of features. We have extended the LOGM algorithm for prototype learning with both prototype-dependent weighting and prototype- independent weighting. Our empirical study on a large suite of benchmark datasets demonstrates that the LOGM is superior to the MCE, GLVQ, SNPC and RSLVQ methods, and moreover, the LOGM with prototype-dependent weighting achieves comparable accura- cies with the support vector machine (SVM) classifier.
The rest of this paper is organized as follows: Section 2 gives the formulation of prototype learning algorithms; Section 3 briefly reviews the MCE and SNPC methods; Section 4 presents a prototype learning algorithm (LOGM) based on the margin-based conditional log-likelihood loss and extends the LOGM to prototype learning with weighted distance metric; Section 5 discusses the relation- ships between our algorithm with previous ones and their convergence; Section 6 presents our experimental results and the last section concludes the paper. This paper is an extension to our previous conference paper [20] by adding the discussions of relationships with other methods, the analysis of learning conver- gence and generalization, and the extension to prototype learning with weighted distance metric. Throughout the paper, we use many acronyms, which are listed in Table 1 for readers’ convenience.
Table 1 - List of acronyms Acronym Full name CLL Conditional Log-likelihood Loss GLVQ Generalized Learning Vector Quantization GRLVQ Generalized Relevance LVQ LOGM LOG-likelihood of Margin LVQ Learning Vector Quantization MAXP MAXimum class Probability MAXP1 A variant of MAXP MCE Minimum Classification Error RBF Radial Basis Function RSLVQ Robust Soft Learning Vector Quantization RSLVQ1 A variant of RSLVQ SNPC Soft Nearest Prototype Classifier SVM Support Vector Machine.
References
- 1. Richard O. Duda, Peter E. Hart, David G. Stork, Pattern Classification (2nd Edition), Wiley-Interscience, 2000
- 2. Chin-Liang Chang, Finding Prototypes For Nearest Neighbor Classifiers, IEEE Transactions on Computers, v.23 n.11, p.1179-1184, November 1974 doi:10.1109/T-C.1974.223827
- 3. Kohonen, T., Improved Versions of Learning Vector Quantization. Neural Networks. V1 I17-21. 545-550.
- 4. A. Sato, K. Yamada, Generalized Learning Vector Quantization, in: Advances in Neural Information Processing Systems, 1995, Pp. 423-429.
- 5. J. C. Bezdek, T. R. Reichherzer, G. S. Lim, Y. Attikiouzel, Multiple-prototype Classifier Design, IEEE Transactions on Systems, Man, and Cybernetics, Part C: Applications and Reviews, v.28 n.1, p.67-79, February 1998 doi:10.1109/5326.661091
- 6. L. I. Kuncheva, J. C. Bezdek, Nearest Prototype Classification: Clustering, Genetic Algorithms, Or Random Search?, IEEE Transactions on Systems, Man, and Cybernetics, Part C: Applications and Reviews, v.28 n.1, p.160-164, February 1998 doi:10.1109/5326.661099
- 7. Liu, C.-L. and Nakagawa, M., Evaluation of Prototype Learning Algorithms for Nearest-neighbor Classifier in Application to Handwritten Character Recognition. Pattern Recognition. V34 I3. 601-615.
- 8. Martí¿n-Valdivia, M.T., Vega, M.G. and López, L.A.U., LVQ for Text Categorization Using a Multilingual Linguistic Resource. Neurocomputing. V55 I3-4. 665-679.
- 9. P. Schneider, M. Biehl, B. Hammer, Advanced Metric Adaptation in Generalized LVQ for Classification of Mass Spectrometry Data, in: Proceeding of the Workshop on Self Organizing Maps, 2007.
- 10. T. Kohonen, J. Hynninen, J. Kangas, J. Laaksonen, K. Torkkola, LVQ PAK: The Learning Vector Quantization Program Package, Technical Report, Helsinki University of Technology, 1995.
- 11. K. Crammer, R. Gilad-Bachrach, A. Navot, N. Tishby, Margin Analysis of the LVQ Algorithm, in: Advances in Neural Information Processing Systems, 2002, Pp. 462-469.
- 12. A. K. Qin, P. N. Suganthan, Rapid and Brief Communication: Initialization Insensitive LVQ Algorithm based on Cost-function Adaptation, Pattern Recognition, v.38 n.5, p.773-776, May, 2005 doi:10.1016/j.patcog.2004.11.011
- 13. Barbara Hammer, Thomas Villmann, Generalized Relevance Learning Vector Quantization, Neural Networks, v.15 n.8-9, p.1059-1068, October 2002 doi:10.1016/S0893-6080(02)00079-5
- 14. Roberto Paredes, Enrique Vidal, Learning Prototypes and Distances: A Prototype Reduction Technique based on Nearest Neighbor Error Minimization, Pattern Recognition, v.39 n.2, p.180-188, February, 2006 doi:10.1016/j.patcog.2005.06.001
- 15. Carlos E. Pedreira, Learning Vector Quantization with Training Data Selection, IEEE Transactions on Pattern Analysis and Machine Intelligence, v.28 n.1, p.157-162, January 2006 doi:10.1109/TPAMI.2006.14
- 16. B.-H. Juang, S. Katagiri, Discriminative Learning for Minimum Error Classification [pattern Recognition], IEEE Transactions on Signal Processing, v.40 n.12, p.3043-3054, December 1992 doi:10.1109/78.175747
- 17. S. Seo, M. Bode, K. Obermayer, Soft Nearest Prototype Classification, IEEE Transactions on Neural Networks, v.14 n.2, p.390-398, March 2003 doi:10.1109/TNn.2003.809407
- 18. Sambu Seo, Klaus Obermayer, Soft Learning Vector Quantization, Neural Computation, v.15 n.7, p.1589-1604, July 2003 doi:10.1162/089976603321891819
- 19. Bartlett, P.L., Jordan, M.I. and McAuliffe, J.D., Convexity, Classification, and Risk Bounds. J. Am. Statist. Assoc. V101 I473. 138-156.
- 20. X. Jin, C.-L. Liu, X. Hou, Prototype Learning by Margin-based Conditional Log-likelihood Loss, In: Proceedings of the 19th ICPR, Tampa, USA, 2008.
- 21. Christopher M. Bishop, Pattern Recognition and Machine Learning (Information Science and Statistics), Springer-Verlag New York, Inc., Secaucus, NJ, 2006
- 22. Hermann Ney, On the Probabilistic Interpretation of Neural Network Classifiers and Discriminative Training Criteria, IEEE Transactions on Pattern Analysis and Machine Intelligence, v.17 n.2, p.107-119, February 1995 doi:10.1109/34.368176
- 23. Daniel Grossman, Pedro Domingos, Learning Bayesian Network Classifiers by Maximizing Conditional Likelihood, Proceedings of the Twenty-first International Conference on Machine Learning, p.46, July 04-08, 2004, Banff, Alberta, Canada doi:10.1145/1015330.1015339
- 24. Cheng-Lin Liu, H. Sako, H. Fujisawa, Discriminative Learning Quadratic Discriminant Function for Handwriting Recognition, IEEE Transactions on Neural Networks, v.15 n.2, p.430-444, March 2004 doi:10.1109/TNn.2004.824263
- 25. Baras, J.S. and LaVigna, A., Convergence of Kohonen's Learning Vector Quantization. Neural Networks. V3 I17-21. 17-20.
- 26. E. B. Kosmatopoulos, M. A. Christodoulou, Convergence Properties of a Class of Learning Vector Quantization Algorithms, IEEE Transactions on Image Processing, v.5 n.2, p.361-368, February 1996 doi:10.1109/83.480771
- 27. Robbins, H. and Monro, S., A Stochastic Approximation Method. Ann. Math. Statist. V22. 400-407.
- 28. Léon Bottou, On-line Learning and Stochastic Approximations, On-line Learning in Neural Networks, Cambridge University Press, New York, NY, 1999
- 29. Vladimir N. Vapnik, The Nature of Statistical Learning Theory, Springer-Verlag New York, Inc., New York, NY, 1995
- 30. Thorsten Joachims, Making Large-scale Support Vector Machine Learning Practical, Advances in Kernel Methods: Support Vector Learning, MIT Press, Cambridge, MA, 1999
- 31. C. Blake, C. Merz, UCI Machine Learning Repository, University of California Irvine {http://www.ics.uci.edu/mlearn/MLRepository.html}, 1998.
- 32. K. Lang, Newsweeder: Learning to Filter Netnews, in: Proceedings of the 12th ICML, 1995, Pp. 331-339.
- 33. Porter, M., An Algorithm for Suffix Stripping. Program. V14 I3. 130-137.
- 34. Fabrizio Sebastiani, Machine Learning in Automated Text Categorization, ACM Computing Surveys (CSUR), v.34 n.1, p.1-47, March 2002 doi:10.1145/505282.505283
- 35. Yiming Yang, Jan O. Pedersen, A Comparative Study on Feature Selection in Text Categorization, Proceedings of the Fourteenth International Conference on Machine Learning, p.412-420, July 08-12, 1997
- 36. Nir Friedman, Dan Geiger, Moises Goldszmidt, Bayesian Network Classifiers, Machine Learning, v.29 n.2-3, p.131-163, Nov./Dec. 1997 doi:10.1023/A:1007465528199
- 37. Wang, W., Xu, Z., Lu, W. and Zhang, X., Determination of the Spread Parameter in the Gaussian Kernel for Classification and Regression. Neurocomputing. V55 I3-4. 643-663.
- 38. R. Collobert, S. Bengio, J. Mariéthoz, Torch: A Modular Machine Learning Software Library, Technical Report IDIAP-RR 02-46, IDIAP, 2002.
- 39. Janez Demšar, Statistical Comparisons of Classifiers over Multiple Data Sets, The Journal of Machine Learning Research, 7, p.1-30, 12/1/2006;
Author | volume | Date Value | title | type | journal | titleUrl | doi | note | year | |
---|---|---|---|---|---|---|---|---|---|---|
2010 RegularizedMarginbasedCondition | Xiao-Bo Jin Cheng-Lin Liu Xinwen Hou | Regularized Margin-based Conditional Log-likelihood Loss for Prototype Learning | 10.1016/j.patcog.2010.01.013 | 2010 |