Wagging (Weight Aggregation) Algorithm

AKA: Wagging Algorithm, Wagging.
Context:
- It was initially developed by Bauer & Kohavi (1999).
Example(s):
- the algorithm introduced by Bauer & Kohavi (1999),
- the algorithm described in Webb (2000),
- …
Counter-Example(s):
See: Ensemble Algorithm, Probabilistic Estimate, Bias-Variance Decomposition, Voting Algorithm, Naive-Bayes Inducer.

References

(Webb, 2000) ⇒ Geoffrey I. Webb. (2000). “MultiBoosting: A Technique for Combining Boosting and Wagging". In: Machine Learning. 40(2). doi:10.1023/A:1007659514849.
- QUOTE: Wagging (Bauer & Kohavi, 1999) is variant of bagging, that requires a base learning algorithm that can utilize training cases with differing weights. Rather than using random bootstrap samples to form the successive training sets, wagging assigns random weights to the cases in each training set. Bauer and Kohavi’s (1999) original formulation of wagging used Gaussian noise to vary the instance weights. However, this can lead to some instance weights being reduced to zero, effectively removing them from the training set. Instead, following a suggestion from Quinlan (personal communication, May 1998) the new technique uses the continuous Poisson distribution^[1] to assign random instance weights. As the assignment of instance weights by bagging can be modeled by the discrete Poisson distribution, use of the continuous Poisson distribution can be viewed as assigning instance weights using an equivalent distribution to bagging, but over a continuous rather than discrete space.

(Bauer et al., 1999) ⇒ Eric Bauer, and Ron Kohavi. (1999). “An Empirical Comparison of Voting Classification Algorithms: Bagging, Boosting, and Variants.” In: Machine Learning Journal, 36(1-2). DOI:10.1023/A:1007515423169.
- QUOTE: An interesting variant of Bagging that we tried is called Wagging (WeightAggregation). This method seeks to repeatedly perturb the training set as in Bagging, but instead of sampling from it, Wagging adds Gaussian noise to each weight with mean zero and a given standard deviation (e.g., 2). For each trial, we start with uniformly weighted instances, add noise to the weights, and induce a classifier. The method has the nice property that one can trade off bias and variance: by increasing the standard deviation of the noise we introduce, more instances will have their weight decrease to zero and disappear, thus increasing bias and reducing variance. Experiments showed that with a standard deviation of 2–3, the method finishes head-to-head with the best variant of Bagging used above, i.e., the error of Bagged MC4 without pruning and with scoring was 10.21% and the errors for Wagging with 2, 2.5, and 3 were 10.19, 10.16, and 10.12%. These differences are not significant. Results for Naive-Bayes were similar.