说好的PCA算法先暂时鸽一下,因为还没写完,大概明天发,先发一个机器学习比赛中常见的bagging算法之一:随机深林算法。需要结合之前发的那篇决策树算法一起看。
随机森林是一个用随机方式建立的,包含多个决策树的集成分类器。其输出的类别由各个树投票而定(如果是回归树则取平均)。假设样本总数为n,每个样本的特征数为a,则随机森林的生成过程如下:
以上两个随机性能够使得随机森林中的决策树都能够彼此不同,提升系统的多样性,从而提升分类性能。 随机森林的优点:
随机森林的缺点:
在上一篇文章实现的决策树的基础上实现,代码如下:
#coding=utf-8
import decision_tree
from decision_tree import DecisionTree
from random import sample, choices, choice
class RandomForest(object):
def __init__(self):
self.trees = None
self.tree_features = None
def fit(self, X, y, n_estimators=, max_depth=, min_samples_split=, max_features=None, n_samples=None):
self.trees = []
self.tree_features = []
for _ in range(n_estimators):
m = len(X[])
n = len(y)
if n_samples:
idx = choices(population=range(n), k=min(n, n_samples))
else:
idx = range(n)
if max_features:
n_features = min(m, max_features)
else:
n_features = int(m ** 0.5)
features = sample(range(m), choice(range(, n_features+)))
X_sub = [[X[i][j] for j in features] for i in idx]
y_sub = [y[i] for i in idx]
clf = DecisionTree()
clf.fit(X_sub, y_sub, max_depth, min_samples_split)
self.trees.append(clf)
self.tree_features.append(features)
def _predict(self, Xi):
pos_vote =
for tree, features in zip(self.trees, self.tree_features):
score = tree._predict([Xi[j] for j in features])
if score >= 0.5:
pos_vote +=
neg_vote = len(self.trees) - pos_vote
if pos_vote > neg_vote:
return
elif pos_vote < neg_vote:
return
else:
return choice([, ])
def predict(self, X):
return [self._predict(Xi) for Xi in X]
@decision_tree.run_time
def main():
print("Tesing the performance of RandomForest...")
# Load data
X, y = decision_tree.load_data()
# Split data randomly, train set rate 70%
X_train, X_test, y_train, y_test = decision_tree.train_test_split(X, y, random_state=)
# Train model
rf = RandomForest()
rf.fit(X_train, y_train, n_samples=, max_depth=, n_estimators=)
# Model evaluation
y_hat = rf.predict(X_test)
acc = decision_tree.get_acc(y_test, y_hat)
print("Accuracy is %.3f" % acc)
在这里插入图片描述
https://blog.csdn.net/login_sonata/article/details/73929426
本文分享自 GiantPandaCV 微信公众号,前往查看
如有侵权,请联系 cloudcommunity@tencent.com 删除。
本文参与 腾讯云自媒体同步曝光计划 ,欢迎热爱写作的你一起参与!