Random Forests Training System

From GM-RKB
Jump to navigation Jump to search

A Random Forests Training System is a decision tree ensemble learning system that implements an RF training algorithm to solve an RF training task.



References

2017a

  • (Wikipedia, 2017) ⇒ https://en.wikipedia.org/wiki/Random_forest Retrieved:2017-10-22.
    • Random forests or random decision forests[1][2] are an ensemble learning method for classification, regression and other tasks, that operate by constructing a multitude of decision trees at training time and outputting the class that is the mode of the classes (classification) or mean prediction (regression) of the individual trees. Random decision forests correct for decision trees' habit of overfitting to their training set. The first algorithm for random decision forests was created by Tin Kam Ho using the random subspace method, which, in Ho's formulation, is a way to implement the "stochastic discrimination" approach to classification proposed by Eugene Kleinberg.[3] [4] [5]

      An extension of the algorithm was developed by Leo Breiman[6] and Adele Cutler,[7] and "Random Forests" is their trademark. [8] The extension combines Breiman's “bagging” idea and random selection of features, introduced first by Ho and later independently by Amit and Geman[9] in order to construct a collection of decision trees with controlled variance.

2017b

2017c

  • (Scikit Learn, 2017) ⇒ http://scikit-learn.org/stable/modules/ensemble.html#random-forests Retrieved: 2017-10-22.
    • QUOTE: In random forests (see RandomForestClassifier and RandomForestRegressor classes), each tree in the ensemble is built from a sample drawn with replacement (i.e., a bootstrap sample) from the training set. In addition, when splitting a node during the construction of the tree, the split that is chosen is no longer the best split among all features. Instead, the split that is picked is the best split among a random subset of the features. As a result of this randomness, the bias of the forest usually slightly increases (with respect to the bias of a single non-random tree) but, due to averaging, its variance also decreases, usually more than compensating for the increase in bias, hence yielding an overall better model.

      In contrast to the original publication B2001, the scikit-learn implementation combines classifiers by averaging their probabilistic prediction, instead of letting each classifier vote for a single class.


  1. Ho, Tin Kam (1995). Random Decision Forests (PDF). Proceedings of the 3rd International Conference on Document Analysis and Recognition, Montreal, QC, 14–16 August 1995. pp. 278–282.
  2. Ho, Tin Kam (1998). "The Random Subspace Method for Constructing Decision Forests" (PDF). IEEE Transactions on Pattern Analysis and Machine Intelligence. 20 (8): 832–844. doi:10.1109/34.709601.
  3. Kleinberg, Eugene (1996). "An Overtraining-Resistant Stochastic Modeling Method for Pattern Recognition" (PDF). Annals of Statistics. 24 (6): 2319–2349. MR 1425956. doi:10.1214/aos/1032181157.
  4. Kleinberg, Eugene (2000). "On the Algorithmic Implementation of Stochastic Discrimination" (PDF). IEEE Transactions on PAMI. 22 (5)
  5. .Kleinberg, Eugine. "Stochastic Discrimination and its Implementation".
  6. Breiman, Leo (2001). "Random Forests". Machine Learning. 45 (1): 5–32. doi:10.1023/A:1010933404324.
  7. Liaw, Andy (16 October 2012). "Documentation for R package randomForest" (PDF). Retrieved 15 March 2013.
  8. U.S. trademark registration number 3185828, registered 2006/12/19.
  9. Amit, Yali; Geman, Donald (1997). "Shape quantization and recognition with randomized trees" (PDF). Neural Computation. 9 (7): 1545–1588. doi:10.1162/neco.1997.9.7.1545.