导入库
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import sklearn
from sklearn import preprocessing
import seaborn as sns
%matplotlib inline读取数据
df =pd.read_csv('./EngineeredData_2.csv')
df =df.dropna()将数据拆分为x和y:
X= df.drop (['Week','Div', 'Date', 'HomeTeam', 'AwayTeam','HTHG', 'HTAG','HTR',
'FTAG', 'FTHG','HGKPP', 'AGKPP', 'FTR'], axis =1)将y转换为整数:
L = preprocessing.LabelEncoder ()
matchresults = L.fit_transform (list (df['FTR']))
y =list(matchresults)将数据拆分为训练和测试:
from sklearn.model_selection import train_test_split
X_tng,X_tst, y_tng, y_tst =train_test_split (X, y, test_size = 50, shuffle=False)
X_tng.head()导入类
from sklearn.linear_model import LogisticRegression实例化模型
logreg = LogisticRegression ()将模型与数据拟合
logreg.fit (X_tng, y_tng)预测测试数据y_pred = logreg.predict (X_tst)
acc = logreg. score (X_tst, y_tst)
print (acc)准确率达到100%有意义吗?
发布于 2019-12-06 16:55:00
问题是,您无意中丢弃了所有功能,只保留了x中的目标值。因此,您正在尝试使用目标值本身来解释目标值,这当然会为您提供100%的准确性。您将功能列定义为:
X= df.drop (['Week','Div', 'Date', 'HomeTeam', 'AwayTeam','HTHG', 'HTAG','HTR',
'FTAG', 'FTHG','HGKPP', 'AGKPP', 'FTR'], axis =1)但您应该将它们定义为:
X= df.drop('FTR', axis =1)https://stackoverflow.com/questions/59119041
复制相似问题