一、想法:
给定一个“点”,kNN会求得这个“点”与其他所有已知结果的“点”的距离,再取前k个最近的“点”的距离进行求平均,将求得的平均值作为给定“点”的预测结果。
二、实现代码
加载所需的包:
import numpy as np
#用于显示进度
from tqdm import tqdm
#将数据x和y进行划分(划分成train和test)
#如需对数据进行标准化,可使用下述函数
def normData(dataSet):
maxVals = dataSet.max(axis=0)
minVals = dataSet.min(axis=0)
ranges = maxVals - minVals
retData = (dataSet - minVals) / ranges
return retData
#单个样本进行预测
def kNN(dataSet, blowers, testData, k):
distSquareMat = (dataSet - testData) ** 2
distSquareSums = distSquareMat.sum(axis=1)
distances = distSquareSums ** 0.5
sortedIndices = distances.argsort()
indices = sortedIndices[:k]
blowerList = []
for i in indices:
blower = blowers[i]
blowerList.append(blower)
s = 0
for j in blowerList:
s += j
result = s/len(blowerList)
return result
#多个样本进行预测(testData为多元数组,blowers为一元数组,dataSet为多元数组)
def predict(dataSet, blowers, testData, k):
predict_result = []
for i in range(len(testData)):
result = kNN(dataSet, blowers, testData[i], k)
predict_result.append(result)
predict_result = np.array(predict_result)
return predict_result
模型的评估(此处采用r2作为评价指标)
r2_list = []
for i in tqdm(range(15)):#此处的15可自由设定
y_pred = predict(x_train_norm, y_train, x_test_norm, k)
r2 = r2_score(y_test, y_pred)
r2_list.append(r2)
三、亦可直接使用 from sklearn.neighbors import KNeighborsRegressor来实现
实现方式加载上述模块即可
KNeighborsRegressor.fit(X=train_x,y=train_y)
参数说明:
fit(self, X, y) Fit the model using X as training data and y as target values Parameters ---------- X : Training data. If array or matrix, shape [n_samples, n_features], or [n_samples, n_samples] if metric='precomputed'. y : Target values, array of float values, shape = [n_samples] or [n_samples, n_outputs]
注:python-version
可关注本人CSDN账号共同学习:Andrew_jdw
才疏学浅,有不足之处还请多多指教
领取专属 10元无门槛券
私享最新 技术干货