首页
学习
活动
专区
工具
TVP
发布
精选内容/技术社群/优惠产品,尽在小程序
立即前往

监督学习之knn和svm

偷懒了好几天,这两天总算把信息安全的进度赶了赶,在关联算法失效的时候决定用监督学习的算法解决,起初决定采用knn来分类,在后续学习中,无意发现了svm,在测试中发现svm的准确率比最好的knn搞0.1个百分比,故最终采用了svm。下对两种监督学习进行简介。

一、简单的理论介绍

首先,对监督学习讲解一下,监督学习和无监督学习的区别就是训练数据的label的有无,监督学习需要带有label,而无监督学习不需要label。在这次实践中,我们做的是网络入侵检测,采用的数据集是kddcup的99数据集(1), 拥有42个维度,其中前41个是描述网络行为的特征值,最后一个是类型,即label。

knn,即k-NearestNeighbor,k最邻近问题,为方便介绍,假设每条数据有两个特征值x和y,一个label,即点的颜色,先将所有数据放在平面直角坐标系中,如下图1.1的红点和蓝点,红点和蓝点所构成的所有点即为训练集,而绿点则是测试点,k最邻近问题的最邻近就是直观的邻近的意思,即离得近,而k指的是找几个离得最近的,如果k=3,那么所选的点即为实线所包含的三个点,若k=5,则为虚线所包含的五个点,而对于预测点分类的预测则是根据所选k个点中最多个数的类别所确定,同样以下图为例,如果k=3,那么预测点的结果将为红色(2个红色,1个蓝色),如果k=5,那么预测点的结果将为蓝色(3个蓝色,2个红色),由此可见,参数k的选取直接影响了预测结果的准确度。

图1.1 knn示例

svm,即Support Vector Machine,中文翻译也很暴力,支持向量机,一听就给人一种懵逼的感觉,下面做简单分解解释:机,即机器,指的是这个模型是一个机器,此外它的作用是分类,所以可以理解为一个分类用的机器,support vevtoe之后再介绍。同样为了简单介绍采用二维介绍,样本同样是带有颜色label的有x和y两个属性的训练点集合,如下图1.2,svm要做什么呢,它要找一条线,使得把两个类别的点区分开来,那么对于接下来的测试点,看测试点位于哪一侧,就将其归类于该类,那么问题来了,符合这个要求的线有很多条,比如图中的黑线和灰线就是其中的两条,那么什么才是最优解呢,现在就要介绍support vector了,就是两个类别中的点离这条分割线最近的距离,如何才是最优解呢,就是让两个类别的离分割线最近的点,再回到最近的问题,什么才是最优解呢,那就是支持向量离分割线越远越好,因为距离越远,允许容纳的点越多,使得分类的越平均,更加理想。

图1.2 svm示例

上面所说的是线性可区分的,然而遇到线性不可分的该怎么办呢,svm又引入了kernel,kernel学的没有十分的透彻,简单的理解了一下,就是把平面变成了曲面,即将原平面函数变成凸二次函数,再加上约束条件,就完成了一个通过增维使非线性样点线性可区分的目的,之后用一个平面去切割凸面,则就找到了一个分割面,把测试点带入凸二次函数中,即可得到划分结果。

有个可视化svm的kernel的视频(2),个人看了表示恍然大悟的感觉

对于高维的svm,同理二维,继续升维,找到超平面,一刀切下去即可、不过据说还有些更高级的处理方式,学疏才浅,没看明白,就不瞎掰扯了。

二、实践

在这次实践中,主要用的是numpy和sklearn两个库,numpy用于存放训练集,label集,还有测试集,以及预测结果,sklearn则是提供了众多机器学习模型,因为懒,不想自己写了,所以直接调用了。

先说knn吧,先建它一个knn,后面的参数是调整k值的(默认是5),之后把原材料,即训练数据和训练标签fit进去,好了,一个knn就搭建好了。

fromsklearnimportneighbors

knn=neighbors.KNeighborsClassifier(n_neighbors=5)

knn.fit(trainData,trainLabel)

之后,把测试数据predict一下,就可以得到预测结果了。

predict = knn.predict(testData)

此外,还可以知道每个类别的可能性。

predict_proba=knn.predict_proba(testData)

好了,这样一个knn的搭建、导入训练集、预测的过程就结束了。

之后再来说svm的,更加简单。

fromsklearnimport

svm=svm.SVC()

svm.fit(trainData,trainLabel)

预测一下testData。

predict=clf.predict(testData)

三、彩(shua)蛋(ping)

如开头所说,我本来是学习knn的,后来偶然间发现了svm,所以我就也测试了一下,将knn的k从1到49(range(50)呵呵哒)都试了下,结果惊喜的发现,svm比最好的knn的预测结果都准一点点,测试结果如下

1:

knn: Total:494021 Correct: 433614 Error: 60407 Accuracy: 0.8777238214569826

svm: Total:494021 Correct: 437527 Error: 56494 Accuracy: 0.8856445373779657

2:

knn: Total:494021 Correct: 431819 Error: 62202 Accuracy: 0.8740903726764652

svm: Total:494021 Correct: 437527 Error: 56494 Accuracy: 0.8856445373779657

3:

knn: Total:494021 Correct: 433837 Error: 60184 Accuracy: 0.8781752192720552

svm: Total:494021 Correct: 437527 Error: 56494 Accuracy: 0.8856445373779657

4:

knn: Total:494021 Correct: 433838 Error: 60183 Accuracy: 0.878177243477504

svm: Total:494021 Correct: 437527 Error: 56494 Accuracy: 0.8856445373779657

5:

knn: Total:494021 Correct: 434368 Error: 59653 Accuracy: 0.8792500723653448

svm: Total:494021 Correct: 437527 Error: 56494 Accuracy: 0.8856445373779657

6:

knn: Total:494021 Correct: 436099 Error: 57922 Accuracy: 0.8827539719971418

svm: Total:494021 Correct: 437527 Error: 56494 Accuracy: 0.8856445373779657

7:

knn: Total:494021 Correct: 436564 Error: 57457 Accuracy: 0.8836952275308134

svm: Total:494021 Correct: 437527 Error: 56494 Accuracy: 0.8856445373779657

8:

knn: Total:494021 Correct: 436603 Error: 57418 Accuracy: 0.883774171543315

svm: Total:494021 Correct: 437527 Error: 56494 Accuracy: 0.8856445373779657

9:

knn: Total:494021 Correct: 436921 Error: 57100 Accuracy: 0.8844178688760195

svm: Total:494021 Correct: 437527 Error: 56494 Accuracy: 0.8856445373779657

10:

knn: Total:494021 Correct: 436910 Error: 57111 Accuracy: 0.8843956026160831

svm: Total:494021 Correct: 437527 Error: 56494 Accuracy: 0.8856445373779657

11:

knn: Total:494021 Correct: 437011 Error: 57010 Accuracy: 0.8846000473664075

svm: Total:494021 Correct: 437527 Error: 56494 Accuracy: 0.8856445373779657

12:

knn: Total:494021 Correct: 436955 Error: 57066 Accuracy: 0.8844866918612772

svm: Total:494021 Correct: 437527 Error: 56494 Accuracy: 0.8856445373779657

13:

knn: Total:494021 Correct: 436978 Error: 57043 Accuracy: 0.8845332485865985

svm: Total:494021 Correct: 437527 Error: 56494 Accuracy: 0.8856445373779657

14:

knn: Total:494021 Correct: 437036 Error: 56985 Accuracy: 0.8846506525026264

svm: Total:494021 Correct: 437527 Error: 56494 Accuracy: 0.8856445373779657

15:

knn: Total:494021 Correct: 437080 Error: 56941 Accuracy: 0.8847397175423717

svm: Total:494021 Correct: 437527 Error: 56494 Accuracy: 0.8856445373779657

16:

knn: Total:494021 Correct: 436660 Error: 57361 Accuracy: 0.883889551253894

svm: Total:494021 Correct: 437527 Error: 56494 Accuracy: 0.8856445373779657

17:

knn: Total:494021 Correct: 436829 Error: 57192 Accuracy: 0.8842316419747339

svm: Total:494021 Correct: 437527 Error: 56494 Accuracy: 0.8856445373779657

18:

knn: Total:494021 Correct: 436522 Error: 57499 Accuracy: 0.8836102109019657

svm: Total:494021 Correct: 437527 Error: 56494 Accuracy: 0.8856445373779657

19:

knn: Total:494021 Correct: 436659 Error: 57362 Accuracy: 0.8838875270484453

svm: Total:494021 Correct: 437527 Error: 56494 Accuracy: 0.8856445373779657

20:

knn: Total:494021 Correct: 436492 Error: 57529 Accuracy: 0.883549484738503

svm: Total:494021 Correct: 437527 Error: 56494 Accuracy: 0.8856445373779657

21:

knn: Total:494021 Correct: 436868 Error: 57153 Accuracy: 0.8843105859872353

svm: Total:494021 Correct: 437527 Error: 56494 Accuracy: 0.8856445373779657

22:

knn: Total:494021 Correct: 436497 Error: 57524 Accuracy: 0.8835596057657468

svm: Total:494021 Correct: 437527 Error: 56494 Accuracy: 0.8856445373779657

23:

knn: Total:494021 Correct: 436452 Error: 57569 Accuracy: 0.8834685165205528

svm: Total:494021 Correct: 437527 Error: 56494 Accuracy: 0.8856445373779657

24:

knn: Total:494021 Correct: 436264 Error: 57757 Accuracy: 0.8830879658961865

svm: Total:494021 Correct: 437527 Error: 56494 Accuracy: 0.8856445373779657

25:

knn: Total:494021 Correct: 436323 Error: 57698 Accuracy: 0.8832073940176632

svm: Total:494021 Correct: 437527 Error: 56494 Accuracy: 0.8856445373779657

26:

knn: Total:494021 Correct: 436219 Error: 57802 Accuracy: 0.8829968766509926

svm: Total:494021 Correct: 437527 Error: 56494 Accuracy: 0.8856445373779657

27:

knn: Total:494021 Correct: 436259 Error: 57762 Accuracy: 0.8830778448689428

svm: Total:494021 Correct: 437527 Error: 56494 Accuracy: 0.8856445373779657

28:

knn: Total:494021 Correct: 436098 Error: 57923 Accuracy: 0.8827519477916931

svm: Total:494021 Correct: 437527 Error: 56494 Accuracy: 0.8856445373779657

29:

knn: Total:494021 Correct: 436127 Error: 57894 Accuracy: 0.882810649749707

svm: Total:494021 Correct: 437527 Error: 56494 Accuracy: 0.8856445373779657

30:

knn: Total:494021 Correct: 435959 Error: 58062 Accuracy: 0.8824705832343159

svm: Total:494021 Correct: 437527 Error: 56494 Accuracy: 0.8856445373779657

31:

knn: Total:494021 Correct: 436223 Error: 57798 Accuracy: 0.8830049734727876

svm: Total:494021 Correct: 437527 Error: 56494 Accuracy: 0.8856445373779657

32:

knn: Total:494021 Correct: 436117 Error: 57904 Accuracy: 0.8827904076952194

svm: Total:494021 Correct: 437527 Error: 56494 Accuracy: 0.8856445373779657

33:

knn: Total:494021 Correct: 436320 Error: 57701 Accuracy: 0.8832013214013169

svm: Total:494021 Correct: 437527 Error: 56494 Accuracy: 0.8856445373779657

34:

knn: Total:494021 Correct: 436410 Error: 57611 Accuracy: 0.8833834998917051

svm: Total:494021 Correct: 437527 Error: 56494 Accuracy: 0.8856445373779657

35:

knn: Total:494021 Correct: 436500 Error: 57521 Accuracy: 0.8835656783820931

svm: Total:494021 Correct: 437527 Error: 56494 Accuracy: 0.8856445373779657

36:

knn: Total:494021 Correct: 436557 Error: 57464 Accuracy: 0.8836810580926722

svm: Total:494021 Correct: 437527 Error: 56494 Accuracy: 0.8856445373779657

37:

knn: Total:494021 Correct: 436564 Error: 57457 Accuracy: 0.8836952275308134

svm: Total:494021 Correct: 437527 Error: 56494 Accuracy: 0.8856445373779657

38:

knn: Total:494021 Correct: 435528 Error: 58493 Accuracy: 0.881598150685902

svm: Total:494021 Correct: 437527 Error: 56494 Accuracy: 0.8856445373779657

39:

knn: Total:494021 Correct: 435561 Error: 58460 Accuracy: 0.881664949465711

svm: Total:494021 Correct: 437527 Error: 56494 Accuracy: 0.8856445373779657

40:

knn: Total:494021 Correct: 154708 Error: 339313 Accuracy: 0.31316077656617836

svm: Total:494021 Correct: 437527 Error: 56494 Accuracy: 0.8856445373779657

41:

knn: Total:494021 Correct: 154468 Error: 339553 Accuracy: 0.3126749672584769

svm: Total:494021 Correct: 437527 Error: 56494 Accuracy: 0.8856445373779657

42:

knn: Total:494021 Correct: 154426 Error: 339595 Accuracy: 0.31258995062962913

svm: Total:494021 Correct: 437527 Error: 56494 Accuracy: 0.8856445373779657

43:

knn: Total:494021 Correct: 150380 Error: 343641 Accuracy: 0.3044000153839614

svm: Total:494021 Correct: 437527 Error: 56494 Accuracy: 0.8856445373779657

44:

knn: Total:494021 Correct: 132313 Error: 361708 Accuracy: 0.26782869554128264

svm: Total:494021 Correct: 437527 Error: 56494 Accuracy: 0.8856445373779657

45:

knn: Total:494021 Correct: 124802 Error: 369219 Accuracy: 0.25262488841567465

svm: Total:494021 Correct: 437527 Error: 56494 Accuracy: 0.8856445373779657

46:

knn: Total:494021 Correct: 124646 Error: 369375 Accuracy: 0.25230911236566866

svm: Total:494021 Correct: 437527 Error: 56494 Accuracy: 0.8856445373779657

47:

knn: Total:494021 Correct: 116557 Error: 377464 Accuracy: 0.23593531449067956

svm: Total:494021 Correct: 437527 Error: 56494 Accuracy: 0.8856445373779657

48:

knn: Total:494021 Correct: 102720 Error: 391301 Accuracy: 0.20792638369623964

svm: Total:494021 Correct: 437527 Error: 56494 Accuracy: 0.8856445373779657

49:

knn: Total:494021 Correct: 102005 Error: 392016 Accuracy: 0.20647907680037894

svm: Total:494021 Correct: 437527 Error: 56494 Accuracy: 0.8856445373779657

遂knn卒,svm胜,狭路相逢准者胜。

特此公告:

初学者学习记录,不对之处请指正,不喜勿喷。

(1) kddcup99数据集地址:http://kdd.ics.uci.edu/databases/kddcup99/kddcup99.html

(2) kernel视频链接:

https://v.qq.com/x/page/k05170ntgzc.html

  • 发表于:
  • 原文链接http://kuaibao.qq.com/s/20180203G0WJ8D00?refer=cp_1026
  • 腾讯「腾讯云开发者社区」是腾讯内容开放平台帐号(企鹅号)传播渠道之一,根据《腾讯内容开放平台服务协议》转载发布内容。
  • 如有侵权,请联系 cloudcommunity@tencent.com 删除。

扫码

添加站长 进交流群

领取专属 10元无门槛券

私享最新 技术干货

扫码加入开发者社群
领券