首页
学习
活动
专区
圈层
工具
发布
首页
学习
活动
专区
圈层
工具
MCP广场
社区首页 >问答首页 >如何在现有的朴素贝叶斯算法中测试新数据(Python 3)

如何在现有的朴素贝叶斯算法中测试新数据(Python 3)
EN

Stack Overflow用户
提问于 2019-10-16 02:05:09
回答 1查看 55关注 0票数 0

很抱歉这里有任何明显的错误-我是一个真正的新手。我将数据集拆分为训练/测试,并成功地应用了贝叶斯算法,结果为0.8888 (参见下面的代码)。现在我想将第二个数据集应用于这个现有算法-相同的特征和标签,但未知的结果。我如何才能做到这一点呢?

代码语言:javascript
运行
复制
import pandas as pd
import numpy as np

testdf = pd.read_csv("train_predictions.csv")

#change output settings
pd.set_option("display.width", 400)
pd.set_option("display.max_columns", 20)
pd.set_option("display.max_rows", 200)

# print data types of each column
print(testdf.dtypes)

# transform str data to numerical
from sklearn import preprocessing
le = preprocessing.LabelEncoder()
testdf["ID"] = le.fit_transform(testdf["ID"])
testdf["THAL"] = le.fit_transform(testdf["THAL"])
print(testdf.head())

# ID is not relevant to model, HEART DZ will be our target
cols = [col for col in testdf.columns if col not in    ["ID","HEART DZ"]]
data = testdf[cols]
target = testdf["HEART DZ"]
print(data.head())

from sklearn.model_selection import train_test_split
# split dataset
data_train, data_test, target_train, target_test = train_test_split(data, target, test_size=0.30, random_state=10)

# Gaussian Naive Bayes
from sklearn.naive_bayes import GaussianNB
from sklearn.metrics import accuracy_score

gnb = GaussianNB()
pred = gnb.fit(data_train, target_train).predict(data_test)
print("Naive-Bayes accuracy : ",accuracy_score(target_test, pred, normalize=True))

更新代码:

代码语言:javascript
运行
复制
testdf = pd.read_csv("train_predictions.csv")
predictdf = pd.read_csv("export_dataframe.csv")

#change output settings
pd.set_option("display.width", 400)
pd.set_option("display.max_columns", 20)
pd.set_option("display.max_rows", 200)

# print data types of each column
#print(predictdf.head())

# transform str data to numerical
from sklearn import preprocessing
le = preprocessing.LabelEncoder()
testdf["ID"] = le.fit_transform(testdf["ID"])
testdf["THAL"] = le.fit_transform(testdf["THAL"])
predictdf["ID"] = le.fit_transform(predictdf["ID"])
predictdf["THAL"] = le.fit_transform(predictdf["THAL"])
#print(predictdf.head())

# ID is not relevant to model, HEART DZ will be our target (drop them)
cols = [col for col in testdf.columns if col not in ["ID","HEART DZ"]]
data = testdf[cols]
target = testdf["HEART DZ"]
pred_cols = [col for col in predictdf.columns if col not in ["ID","HEART DZ"]]
pred_data = predictdf[cols]
pred_target = predictdf["HEART DZ"]
#print(pred_data.head())

from sklearn.model_selection import train_test_split
# split dataset
data_train, data_test, target_train, target_test = train_test_split(data, target, test_size=0.30) #random_state=10)

# Gaussian Naive Bayes
from sklearn.naive_bayes import GaussianNB
from sklearn.metrics import accuracy_score

gnb = GaussianNB()
pred = gnb.fit(data_train, target_train).predict(data_test)
predictions = gnb.predict([predictdf])
#print("Naive-Bayes accuracy : ",accuracy_score(target_test, pred, normalize=True))
print(predictions)

更新代码2

代码语言:javascript
运行
复制
testdf = pd.read_csv("train_predictions.csv")
testlabelsdf = pd.read_csv("train_labels.csv")
predictdf = pd.read_csv("export_dataframe.csv")
#print(testlabelsdf.head())

# transform str to int
from sklearn import preprocessing
le = preprocessing.LabelEncoder()
testdf["ID"] = le.fit_transform(testdf["ID"])
predictdf["ID"] = le.fit_transform(predictdf["ID"])
testdf["THAL"] = le.fit_transform(testdf["THAL"])
predictdf["THAL"] = le.fit_transform(predictdf["THAL"])

# ID is not relevant to model, HEART DZ will be our target (drop them)
cols = [col for col in testdf.columns if col not in ["ID"]]
data = testdf[cols]
target = testlabelsdf["HEART DZ"]

from sklearn.model_selection import train_test_split
# split dataset
data_train, data_test, target_train, target_test = train_test_split(data, target, random_state=10) #test_size=0.30,random_state=10)

# Gaussian Naive Bayes
from sklearn.naive_bayes import GaussianNB
from sklearn.metrics import accuracy_score

gnb = GaussianNB()
gnb.fit(data_train, target_train)
target_pred = gnb.predict(data_test)
ac = accuracy_score(target_test, target_pred, normalize=True)

yNew = gnb.predict(predictdf)
#print(yNew)

for i in range(len(predictdf)):
    print("Predicted: ", yNew[i])
EN

回答 1

Stack Overflow用户

发布于 2019-10-16 02:17:37

scikit learn库中的any estimator类包含一个名为predict的方法,您的GaussianNB也不例外,因此,为了预测新数据的标签,您可以使用.predict方法,如下所示;

代码语言:javascript
运行
复制
...your codes here
gnb = GaussianNB() #again your code
...your codes here as well
data_to_predict = ... # load your new data to predict here 
predictions = gnb.predict[data_to_predict]
print(predictions)

有关详细信息,请查看this documentation

票数 0
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/58400347

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档