随机森林是最简单的集成学习算法,其核心是两个随机加多棵CART树,即从样本集中有放回地随机选择n个样本,再从所有属性中随机选择k个属性,重复以上步骤,来建立m棵决策树,最后通过投票表决,决定数据属于哪一类别。本文依然以Sklearn数据为例,来对比随机森林和决策树的分类效果。
代码示例
1、导入并拆分数据集
from sklearn.datasets import load_wine from sklearn.model_selection import train_test_split wine = load_wine() x_train, x_test, y_train, y_test = train_test_split(wine.data, wine.target)
2、建立随机森林和决策树模型
from sklearn.tree import DecisionTreeClassifier from sklearn.ensemble import RandomForestClassifier clf = DecisionTreeClassifier(random_state=42) clf.fit(x_train, y_train) score_c = clf.score(x_test, y_test) rfc = RandomForestClassifier(random_state=42) rfc.fit(x_train, y_train) score_r = rfc.score(x_test, y_test) print('single tree:', score_c, 'random forest:', score_r)
3、交叉验证效果对比
from sklearn.model_selection import cross_val_score import matplotlib.pyplot as plt clf = DecisionTreeClassifier() score_c = cross_val_score(clf, wine.data, wine.target, cv=10) rfc = RandomForestClassifier() score_r = cross_val_score(rfc, wine.data, wine.target, cv=10) plt.plot(range(1,11), score_c, label='decision tree') plt.plot(range(1,11), score_r, label='random forest') plt.legend() plt.show()
本文为 陈华 原创,欢迎转载,但请注明出处:http://edu.ichenhua.cn/read/258
- 下一篇:
- 冒泡排序算法