资源简介
我选用了一个经典数据集来展示如何构建一个决策树模型,这个数据集是——Iris 鸢尾花数据集。里面有我进行数据预处理,分析,优化参数,训练模型以及最终分析决策树的代码。
代码片段和文件信息
import numpy as np
from sklearn.tree import DecisionTreeRegressor
import matplotlib.pyplot as plt
import pandas as pd
iris_data = pd.read_csv(‘iris.csv‘)
iris_data = iris_data.loc[(iris_data[‘Species‘] != ‘setosa‘) | (iris_data[‘Sepal.Width‘] >= 2.5)]
from sklearn.model_selection import train_test_split
# #将75%的数据放入训练集,25%的数据放入测试集
all_inputs = iris_data[[‘Sepal.Length‘ ‘Sepal.Width‘ ‘Petal.Length‘ ‘Petal.Width‘]]
all_classes = iris_data[‘Species‘].values
from sklearn.tree import DecisionTreeClassifier
#使用上述得到的参数
clf=DecisionTreeClassifier(class_weight=None criterion=‘gini‘ max_depth=2
max_features=3 max_leaf_nodes=None min_impurity_decrease=0.0
min_impurity_split=None min_samples_leaf=1
min_samples_split=2 min_weight_fraction_leaf=0.0
presort=False random_state=None splitter=‘best‘)
(X_train X_test Y_train Y_test) = train_test_split(all_inputs all_classestest_size=0.25 random_state=0)
clf.fit(X_train Y_train)
from IPython.display import Image
from sklearn import tree
import pydotplus
dot_data = tree.export_graphviz(clf out_file=None
feature_names=[‘Sepal.Length‘ ‘Sepal.Width‘ ‘Petal.Length‘ ‘Petal.Width‘] #对应特征的名字
class_names=[‘Setosa‘‘Versicolour‘‘Virginica‘] #对应类别的名字
filled=True rounded=True
special_characters=True)
graph = pydotplus.graph_from_dot_data(dot_data)
graph.write_png(‘flower-de-luce.png‘) #保存图像
Image(graph.create_png())
属性 大小 日期 时间 名称
----------- --------- ---------- ----- ----
.CA.... 1683 2020-03-11 21:03 iris\GeneratingDecisionTree.py
.CA.... 1396 2020-03-11 20:48 iris\getParameters.py
.CA.... 4978 2020-03-11 16:30 iris\iris.csv
.CA.... 890 2020-03-11 20:23 iris\preprocessing.py
.CA.... 1272 2020-03-11 21:10 iris\printFeature.py
.CA.... 1397 2020-03-11 20:48 iris\use.py
.C.D... 0 2020-03-11 21:19 iris
----------- --------- ---------- ----- ----
11616 7
评论
共有 条评论