读童话的狼

2020-09-16   阅读量: 983

随机森林中的oob误差估计

扫码加入数据分析学习群

一个大小为N的数据集D. 对数据集随机有放回抽样N次作为一棵CART树的训练集.

根据概率论,可知数据集中有大约1/3的数据是没有被选取的(称为Out of bag),所以就是这没被选取的部分作为小树的测试集.


数据集D中的每一个样本都可以拿来做测试数据, 对于一个样本d, 森林中大约有1/e树是OOB的, 那么这1/e的树就构成了预测样本d的森林,用简单投票法计算分类结果. 从而得到总的error.

(Predict probability for each possible outcome.

Compute the probability estimates for each single sample in X and each possible outcome seen during training (categorical distribution).Put each case left out in the construction of the kth tree down the kth tree to get a classification. In this way, a test set classification is obtained for each case in about one-third of the trees. At the end of the run, take j to be the class that got most of the votes every time case n was oob. The proportion of times that j is not equal to the true class of n averaged over all cases is the oob error estimate. This has proven to be unbiased in many tests)


37.9039 2 0 关注作者 收藏

评论(0)


暂无数据

推荐课程