Negative Correlation Learning Task
A Negative Correlation Learning Task is an Ensemble Learning Task that introduces a correlation penalty term to the cost function of each individual member of a ensemble.
- AKA: NC-Learning, NCL.
- Context:
- It can be solved by Negative Correlation Learning System by implementing a Negative Correlation Learning Algorithm.
- It can range from being a Supervised NCL Task, to being a Semisupervised NCL Task, to being an Unsupervised NCL Task.
- Example(s):
- AdaBoost-NC,
- SemiNCL.
- …
- Counter-Example(s):
- See: Neural Network Ensemble, Negative Correlation Learning, Generalisation, Correlation, Combination Method, Correct Response Set, Machine Learning Regression, Machine Learning Classification.
References
2019
- (Scholarpedia, 2019) ⇒ Huanhuan Chen, and Xin Yao (2019). http://www.scholarpedia.org/article/User:Ke_CHEN/Proposed/Negatively_Correlated_Ensemble_Learning. Retrived:2019-02-15.
- QUOTE: As an effective approach to improve the generalization of supervised classifiers, ensembles of multiple learning machines, i.e. groups of learners that work together as a committee, have attracted a lot of research interest in the machine learning community. Most existing ensemble methods, such as Bagging, ensembles of features and Random Forests, always train ensemble members independently. In this situation, the interaction and cooperation among the individual members in the ensemble may not be fully exploited.
Negative Correlation Learning (NCL) (Liu and Yao, 1999a and 1999b) is a specific ensemble method, which emphasizes interaction and cooperation among individual members in the ensemble. It uses a penalty term in the error function to produce biased individual learners whose errors tend to be negatively correlated. Specifically, NCL introduces a correlation penalty term to the cost function of each individual network so that each neural network minimizes its mean-square-error (MSE) together with the error correlation within the ensemble. This encourages diversity, which is essential for good ensemble performance (Brown et al., 2005). This article summarizes work on NCL, including the problem formulation, the training algorithms, the parameter selection algorithm, the ensemble selection and combination methods and some variants of NCL for specific applications.
- QUOTE: As an effective approach to improve the generalization of supervised classifiers, ensembles of multiple learning machines, i.e. groups of learners that work together as a committee, have attracted a lot of research interest in the machine learning community. Most existing ensemble methods, such as Bagging, ensembles of features and Random Forests, always train ensemble members independently. In this situation, the interaction and cooperation among the individual members in the ensemble may not be fully exploited.
2018
- (Chen et al., 2018) ⇒ Huanhuan Chen, Bingbing Jiang, and Xin Yao. (2018). “Semisupervised Negative Correlation Learning.” In: Proceedings of IEEE Transactions on Neural Networks and Learning Systems . doi:10.1109/TNNLS.2017.2784814
- ABSTRACT: Negative correlation learning (NCL) is an ensemble learning algorithm that introduces a correlation penalty term to the cost function of each individual ensemble member. Each ensemble member minimizes its mean square error and its error correlation with the rest of the ensemble. This paper analyzes NCL and reveals that adopting a negative correlation term for unlabeled data is beneficial to improving the model performance in the semisupervised learning (SSL) setting. We then propose a novel SSL algorithm, Semisupervised NCL (SemiNCL) algorithm. The algorithm considers the negative correlation terms for both labeled and unlabeled data for the semisupervised problems. In order to reduce the computational and memory complexity, an accelerated SemiNCL is derived from the distributed least square algorithm. In addition, we have derived a bound for two parameters in SemiNCL based on an analysis of the Hessian matrix of the error function. The new algorithm is evaluated by extensive experiments with various ratios of labeled and unlabeled training data. Comparisons with other state-of-the-art supervised and semisupervised algorithms confirm that SemiNCL achieves the best overall performance.
2017
- (Sammut & Webb, 2017) ⇒ (2017). “Negative Correlation Learning”. In: (Sammut & Webb, 2017). DOI:10.1007/978-1-4899-7687-1_956
- QUOTE: Negative correlation learning (Liu and Yao 1999) is an ensemble learning technique. It can be used for regression or classification problems, though with classification problems the models must be capable of producing posterior probabilities. The model outputs are combined with a uniformly weighted average. The squared error is augmented with a penalty term which takes into account the diversity of the ensemble. The error for the ith model is,
[math]\displaystyle{ E \left(f_i(x)\right)=\dfrac{1}{2} \left(f_i(x)-d \right)^2 - \lambda \left(f_i(x)-\hat{f}(x)\right)^2 }[/math].
The coefficient determines the balance between optimizing individual accuracy, and optimizing ensemble diversity. With [math]\displaystyle{ \lambda= 0 }[/math], the models are trained independently, with no emphasis on diversity. With [math]\displaystyle{ \lambda=1 }[/math], the models are tightly coupled, and the ensemble is trained as a single unit. Theoretical studies (Brown et al. 2006) have shown that NC works by directly optimizing the bias-variance-covariance trade-off, thus it explicitly manages the ensemble diversity. When the complexity of the individuals is sufficient to have high individual accuracy, NC provides little benefit. When the complexity is low, NC with a well-chosen can provide significant performance improvements. Thus the best situation to make use of the NC framework is with a large number of low accuracy models.
- QUOTE: Negative correlation learning (Liu and Yao 1999) is an ensemble learning technique. It can be used for regression or classification problems, though with classification problems the models must be capable of producing posterior probabilities. The model outputs are combined with a uniformly weighted average. The squared error is augmented with a penalty term which takes into account the diversity of the ensemble. The error for the ith model is,
2010
- (Wang & Yao, 2010) ⇒ Shuo Wang, and Xin Yao. (2010)."The Effectiveness of a New Negative Correlation Learning Algorithm for Classification Ensembles." In: Proceedings of the 2010 IEEE International Conference on Data Mining Workshops. ISBN:978-0-7695-4257-7 doi:10.1109/ICDMW.2010.196
- QUOTE: Since diversity has been recognized as a main reason for the success of an ensemble model, a number of researchers are encouraged to develop negative correlation learning (NCL) algorithms and related theoretical studies. Aiming at better generalization performance, NCL is an ensemble learning technique that takes into account the diversity of the ensemble explicitly during training 1, so as to balance individual errors and the ensemble covariance.
1999
- (Yao & Liu, 1999) ⇒ X.Yao Y. Liu. (1999). “Ensemble Learning via Negative Correlation.” In: Neural Networks Journal, 12(10). doi:10.1016/S0893-6080(99)00073-8
- QUOTE: The idea behind negative correlation learning is to encourage different individual networks in an ensemble to learn different parts or aspects of a training data so that the ensemble can learn the whole training data better (...)
Negative correlation learning is also different from the mixtures-of-experts (ME) architecture (Jacobs, 1997) that consists of a gating network and a number of expert networks although ME architecture can also produce biased individual networks whose estimates are negatively correlated. Negative correlation learning does not need a separate gating network. It uses a totally different error function. The [math]\displaystyle{ \lambda }[/math] parameter in negative correlation learning provides a convenient way to balance the bias-variance-covariance trade-off (...).
- QUOTE: The idea behind negative correlation learning is to encourage different individual networks in an ensemble to learn different parts or aspects of a training data so that the ensemble can learn the whole training data better (...)