鸢尾花(学名:Iris tectorum Maxim)属百合目、鸢尾科,可供观赏,花香气淡雅,可以调制香水,其根状茎可作中药,全年可采,具有消炎作用。
鸢尾花主要有三个品种,setosa,versicolor,virginnica(山鸢尾、变色鸢尾和维吉尼亚鸢尾)。在进行分类时,主要依据是花瓣的长度(Petal Length)、宽度(Petal Width),花萼的长度(Sepal Length)和宽度(Sepal Width)(均以厘米做单位)。
本文主要是建立一个基础的机器学习的模型,根据所得到的四个长度对鸢尾花进行分类预测。因此本文是三分类问题。所使用的数据集来源于:http://archive.ics.uci.edu/ml/datasets/Iris
下面进行数据集查看和简单处理
首先引入pandas工具包:
import pandas as pd
将下载好的iris.data文件放于代码所在文件夹中
开始读取数据,并输出一下查看
data = pd.read_csv('iris.data')
print(data.head())
输出为
5.1 3.5 1.4 0.2 Iris-setosa
0 4.9 3.0 1.4 0.2 Iris-setosa
1 4.7 3.2 1.3 0.2 Iris-setosa
2 4.6 3.1 1.5 0.2 Iris-setosa
3 5.0 3.6 1.4 0.2 Iris-setosa
4 5.4 3.9 1.7 0.4 Iris-setosa
这里注意到没有标签标识,考虑添加标签代码
data.columns = ['sepal_len', 'sepal_width', 'petal_len', 'petal_width', 'class']
print(data.head())
输出为
sepal_len sepal_width petal_len petal_width class
0 4.9 3.0 1.4 0.2 Iris-setosa
1 4.7 3.2 1.3 0.2 Iris-setosa
2 4.6 3.1 1.5 0.2 Iris-setosa
3 5.0 3.6 1.4 0.2 Iris-setosa
4 5.4 3.9 1.7 0.4 Iris-setosa
建立模型前,首先介绍pipeline代码:
Pipeline可以将许多算法模型串联起来,比如将特征提取、归一化、分类组织在一起形成一个典型的机器学习问题工作流。主要带来两点好处:
开始构建模型
x = x[:, :2]
# 这里取前两个样本
lr = Pipeline([('sc', StandardScaler()),
('poly', PolynomialFeatures(degree=3)),
('clf', LogisticRegression())])
# StandardScaler----计算训练集的平均值和标准差,以便测试数据集使用相同的变换
# PolynomialFeatures使用多项式的方法来进行的,degree:控制多项式的度
# LogisticRegression()逻辑回归算法
lr.fit(x, y.ravel())
# .ravel(扁平化操作)
y_hat = lr.predict(x)
y_hat_prob = lr.predict_proba(x)
np.set_printoptions(suppress=True)
输出计算值
print('y_hat = \n', y_hat)
输出为
y_hat =
['Iris-setosa' 'Iris-setosa' 'Iris-setosa' 'Iris-setosa' 'Iris-setosa'
'Iris-setosa' 'Iris-setosa' 'Iris-setosa' 'Iris-setosa' 'Iris-setosa'
'Iris-setosa' 'Iris-setosa' 'Iris-setosa' 'Iris-setosa' 'Iris-setosa'
'Iris-setosa' 'Iris-setosa' 'Iris-setosa' 'Iris-setosa' 'Iris-setosa'
'Iris-setosa' 'Iris-setosa' 'Iris-setosa' 'Iris-setosa' 'Iris-setosa'
'Iris-setosa' 'Iris-setosa' 'Iris-setosa' 'Iris-setosa' 'Iris-setosa'
'Iris-setosa' 'Iris-setosa' 'Iris-setosa' 'Iris-setosa' 'Iris-setosa'
'Iris-setosa' 'Iris-setosa' 'Iris-setosa' 'Iris-setosa' 'Iris-setosa'
'Iris-setosa' 'Iris-setosa' 'Iris-setosa' 'Iris-setosa' 'Iris-setosa'
'Iris-setosa' 'Iris-setosa' 'Iris-setosa' 'Iris-setosa' 'Iris-virginica'
'Iris-virginica' 'Iris-virginica' 'Iris-versicolor' 'Iris-virginica'
'Iris-versicolor' 'Iris-virginica' 'Iris-versicolor' 'Iris-virginica'
'Iris-versicolor' 'Iris-versicolor' 'Iris-versicolor' 'Iris-versicolor'
'Iris-versicolor' 'Iris-versicolor' 'Iris-virginica' 'Iris-versicolor'
'Iris-versicolor' 'Iris-versicolor' 'Iris-versicolor' 'Iris-versicolor'
'Iris-versicolor' 'Iris-versicolor' 'Iris-versicolor' 'Iris-virginica'
'Iris-virginica' 'Iris-virginica' 'Iris-virginica' 'Iris-versicolor'
'Iris-versicolor' 'Iris-versicolor' 'Iris-versicolor' 'Iris-versicolor'
'Iris-versicolor' 'Iris-versicolor' 'Iris-virginica' 'Iris-virginica'
'Iris-versicolor' 'Iris-versicolor' 'Iris-versicolor' 'Iris-versicolor'
'Iris-versicolor' 'Iris-versicolor' 'Iris-versicolor' 'Iris-versicolor'
'Iris-versicolor' 'Iris-versicolor' 'Iris-versicolor' 'Iris-versicolor'
'Iris-versicolor' 'Iris-virginica' 'Iris-versicolor' 'Iris-virginica'
'Iris-versicolor' 'Iris-virginica' 'Iris-virginica' 'Iris-versicolor'
'Iris-virginica' 'Iris-virginica' 'Iris-virginica' 'Iris-virginica'
'Iris-virginica' 'Iris-virginica' 'Iris-versicolor' 'Iris-versicolor'
'Iris-virginica' 'Iris-virginica' 'Iris-virginica' 'Iris-virginica'
'Iris-versicolor' 'Iris-virginica' 'Iris-versicolor' 'Iris-virginica'
'Iris-versicolor' 'Iris-virginica' 'Iris-virginica' 'Iris-versicolor'
'Iris-versicolor' 'Iris-virginica' 'Iris-virginica' 'Iris-virginica'
'Iris-virginica' 'Iris-virginica' 'Iris-versicolor' 'Iris-versicolor'
'Iris-virginica' 'Iris-virginica' 'Iris-virginica' 'Iris-versicolor'
'Iris-virginica' 'Iris-virginica' 'Iris-virginica' 'Iris-versicolor'
'Iris-virginica' 'Iris-virginica' 'Iris-virginica' 'Iris-versicolor'
'Iris-virginica' 'Iris-virginica' 'Iris-versicolor']
print('y_hat_prob = \n', y_hat_prob)
y_hat_prob =
[[0.85534909 0.09533414 0.04931677]
[0.98474307 0.00832482 0.00693211]
[0.98625508 0.00560102 0.00814391]
[0.99481669 0.00337505 0.00180826]
[0.99555309 0.00029521 0.0041517 ]
[0.99869804 0.00071571 0.00058625]
[0.96800558 0.02312483 0.00886959]
[0.98822691 0.00186241 0.00991067]
[0.90246766 0.0651196 0.03241274]
[0.97733905 0.00824859 0.01441237]
[0.99239647 0.00505615 0.00254739]
[0.91642349 0.04917927 0.03439724]
[0.99676732 0.00030162 0.00293106]
[0.9398823 0.00009638 0.06002132]
[0.99461788 0. 0.00538212]
[0.99555309 0.00029521 0.0041517 ]
[0.97350852 0.01856996 0.00792152]
[0.93843364 0.00421448 0.05735187]
[0.99895347 0.00038381 0.00066272]
[0.76891197 0.16398317 0.06710486]
[0.99654569 0.00181396 0.00164035]
[0.99985893 0.00008626 0.0000548 ]
[0.89079074 0.08149018 0.02771908]
[0.99239647 0.00505615 0.00254739]
[0.76081779 0.16883512 0.07034708]
[0.96800558 0.02312483 0.00886959]
[0.95362866 0.03209173 0.01427961]
[0.90242719 0.07174113 0.02583168]
[0.98474307 0.00832482 0.00693211]
[0.94634023 0.03227143 0.02138834]
[0.76891197 0.16398317 0.06710486]
[0.99990061 0.00000102 0.00009837]
[0.99854342 0.00000022 0.00145636]
[0.90246766 0.0651196 0.03241274]
[0.89203188 0.07793728 0.03003084]
[0.82742392 0.10540162 0.06717445]
[0.90246766 0.0651196 0.03241274]
[0.99357802 0.00107196 0.00535003]
[0.94205304 0.04253986 0.01540711]
[0.98600147 0.00980546 0.00419307]
[0.73275657 0.24653389 0.02070954]
[0.99868435 0.00031417 0.00100147]
[0.98600147 0.00980546 0.00419307]
[0.99895347 0.00038381 0.00066272]
[0.91642349 0.04917927 0.03439724]
[0.99895347 0.00038381 0.00066272]
[0.99300324 0.00314587 0.0038509 ]
[0.98722324 0.00541553 0.00736122]
[0.9372624 0.04564148 0.01709611]
[0.00066831 0.19151727 0.80781442]
[0.03574259 0.42213636 0.54212105]
[0.00172174 0.2380128 0.76026547]
[0.00029696 0.8681783 0.13152475]
[0.00950958 0.39994199 0.59054842]
[0.04283799 0.66526153 0.29190048]
[0.06414466 0.4034332 0.53242214]
[0.09112085 0.81279954 0.09607961]
[0.00919148 0.3627435 0.62806502]
[0.14866386 0.65381848 0.19751766]
[0.00000411 0.98713147 0.01286442]
[0.06444261 0.60066389 0.3348935 ]
[0.00000726 0.77291594 0.2270768 ]
[0.03528951 0.56903905 0.39567144]
[0.08306604 0.64863356 0.2683004 ]
[0.00778275 0.32411726 0.66809999]
[0.12020666 0.62210179 0.25769155]
[0.02028952 0.67175405 0.30795643]
[0.00000391 0.65907266 0.34092342]
[0.00481714 0.77187864 0.22330423]
[0.11985284 0.5470987 0.33304846]
[0.02379884 0.57797637 0.39822479]
[0.00126724 0.49987542 0.49885734]
[0.02379884 0.57797637 0.39822479]
[0.01930587 0.46089705 0.51979708]
[0.01163503 0.37280608 0.61555889]
[0.00183391 0.23007537 0.76809072]
[0.00679824 0.32461748 0.66858428]
[0.04065629 0.59470414 0.36463958]
[0.01100368 0.71867499 0.27032133]
[0.00166588 0.82860137 0.16973275]
[0.00166588 0.82860137 0.16973275]
[0.02028952 0.67175405 0.30795643]
[0.01541111 0.62137558 0.36321331]
[0.22790609 0.5687734 0.20332051]
[0.22911292 0.37297796 0.39790913]
[0.00778275 0.32411726 0.66809999]
[0.00003469 0.54029537 0.45966993]
[0.12020666 0.62210179 0.25769155]
[0.00647063 0.78828408 0.20524529]
[0.01829676 0.74880171 0.23290153]
[0.047037 0.55814244 0.39482055]
[0.00920271 0.69735007 0.29344721]
[0.00660544 0.92151972 0.07187483]
[0.03004904 0.70327522 0.26667574]
[0.09397292 0.62309996 0.28292712]
[0.06590712 0.64403573 0.29005715]
[0.03009288 0.53848234 0.43142478]
[0.0533754 0.79584651 0.15077809]
[0.04283799 0.66526153 0.29190048]
[0.06414466 0.4034332 0.53242214]
[0.02028952 0.67175405 0.30795643]
[0.00017942 0.15797581 0.84184477]
[0.02476733 0.5025339 0.47269877]
[0.01778844 0.41878697 0.56342459]
[0.00000002 0.07518897 0.92481101]
[0.26845465 0.61461222 0.11693313]
[0.00000746 0.09075442 0.90923812]
[0.00020758 0.20054784 0.79924458]
[0.00007868 0.02245862 0.97746269]
[0.02450888 0.38498824 0.59050288]
[0.00733268 0.44265197 0.55001535]
[0.00348126 0.27669229 0.71982645]
[0.00380647 0.75199042 0.24420311]
[0.0359175 0.65054005 0.31354245]
[0.03574259 0.42213636 0.54212105]
[0.01778844 0.41878697 0.56342459]
[0. 0.00115857 0.99884143]
[0. 0.01728022 0.98271978]
[0.00000726 0.77291594 0.2270768 ]
[0.00183015 0.22552467 0.77264519]
[0.05348703 0.67448608 0.27202689]
[0. 0.0390756 0.9609244 ]
[0.00946976 0.4985239 0.49200634]
[0.00987797 0.26085566 0.72926638]
[0.00005193 0.13855405 0.86139402]
[0.02041318 0.54303587 0.43655094]
[0.047037 0.55814244 0.39482055]
[0.01316304 0.45370611 0.53313084]
[0.00004617 0.13077655 0.86917728]
[0.00000075 0.05736916 0.94263009]
[0. 0.0012725 0.9987275 ]
[0.01316304 0.45370611 0.53313084]
[0.01686445 0.50166674 0.48146882]
[0.00596213 0.60032111 0.39371676]
[0. 0.07108026 0.92891974]
[0.0949242 0.31744331 0.58763249]
[0.0297933 0.44965611 0.5205506 ]
[0.05504791 0.58154689 0.3634052 ]
[0.00172174 0.2380128 0.76026547]
[0.00778275 0.32411726 0.66809999]
[0.00172174 0.2380128 0.76026547]
[0.02028952 0.67175405 0.30795643]
[0.00427181 0.26354829 0.7321799 ]
[0.00987797 0.26085566 0.72926638]
[0.00679824 0.32461748 0.66858428]
[0.00126724 0.49987542 0.49885734]
[0.01778844 0.41878697 0.56342459]
[0.13172405 0.34233852 0.52593742]
[0.06444261 0.60066389 0.3348935 ]]
输出正确率
print('准确度:%.2f%%' % (100*np.mean(y_hat == y.ravel())))
准确度:80.54%
我们换用K近邻分类法重新对其进行分析
from sklearn.datasets import load_iris
iris_dataset = load_iris()
# print(iris_dataset)
from sklearn.model_selection import train_test_split
x_train, y_train, x_test, y_test = train_test_split(iris_dataset['data'], iris_dataset['target'], 0.8, random_state=0)
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from pandas.plotting import scatter_matrix
iris_dataframe = pd.DataFrame(x_train, columns=iris_dataset.feature_names)
grr = pd.plotting.scatter_matrix(iris_dataframe, c=y_train, marker='o', figsize=(10, 10), hist_kwds = {'bins': 20}, s=60, alpha = 0.8, cmap='viridis')
plt.show()
以上是2-2分类绘制散点图
引入K分类工具包
from sklearn.neighbors import KNeighborsClassifier
Knn = KNeighborsClassifier(n_neighbors=1)
Knn.fit(x_train, y_train)
开始预测并输出结果
y_pred = Knn.predict(x_test)
print("Test set score: {:.2f}".format(np.mean(y_pred == y_test)))
结果为
Test set score: 0.97
该结果要高于之前的算法。随着进一步调参,正确率也会进一步提高。
本文分享自 python pytorch AI机器学习实践 微信公众号,前往查看
如有侵权,请联系 cloudcommunity@tencent.com 删除。
本文参与 腾讯云自媒体同步曝光计划 ,欢迎热爱写作的你一起参与!
扫码关注腾讯云开发者
领取腾讯云代金券
Copyright © 2013 - 2025 Tencent Cloud. All Rights Reserved. 腾讯云 版权所有
深圳市腾讯计算机系统有限公司 ICP备案/许可证号:粤B2-20090059 深公网安备号 44030502008569
腾讯云计算(北京)有限责任公司 京ICP证150476号 | 京ICP备11018762号 | 京公网安备号11010802020287
Copyright © 2013 - 2025 Tencent Cloud.
All Rights Reserved. 腾讯云 版权所有