腾讯云

文章/答案/技术大牛

发布

社区首页 >问答首页 >基于NumPy的自动Keras训练中的StratifiedKFold数组值错误

问基于NumPy的自动Keras训练中的StratifiedKFold数组值错误
EN

Stack Overflow用户

提问于 2022-05-06 02:42:18

回答 1查看 63关注 0票数 0

背景

我的情感分析研究涉及到各种各样的数据集。最近，我遇到了一个数据集，不知何故，我无法成功地进行训练。我主要处理的是.CSV文件格式的开放数据，因此Pandas和NumPy被大量使用。

在我的研究中，其中一种方法是尝试集成自动机器学习(AutoML)，而我选择使用的库是Auto-Keras，主要是使用它的TextClassifier()包装器功能来实现AutoML。

主要问题

我已经用正式文档验证了，TextClassifier()以NumPy数组的格式接收数据。但是，当我将数据加载到Pandas DataFrame并在需要训练的列上使用.to_numpy()时，下面的错误一直显示：

ValueError                                Traceback (most recent call last)
<ipython-input-13-1444bf2a605c> in <module>()
     16     clf = ak.TextClassifier(overwrite=True, max_trials=2)
     17 
---> 18     clf.fit(x_train, y_train, epochs=3, callbacks=cbs)
     19 
     20 

ValueError: Failed to convert a NumPy array to a Tensor (Unsupported object type float).

错误相关代码扇区

使用Pandas DataFrame删除不必要的.drop()列的扇区，并使用Pandas提供的to_numpy()函数将所需的列转换为NumPy数组。

df_src = pd.read_csv(get_data)

df_src = df_src.drop(columns=["Name", "Cast", "Plot", "Direction",
                "Soundtrack", "Acting", "Cinematography"])

df_src = df_src.reset_index(drop=True)

X = df_src["Review"].to_numpy()

Y = df_src["Overall Sentiment"].to_numpy()

print(X, "\n")
print("\n", Y)

主错误代码部分，在执行StratifedKFold()的同时，使用TextClassifier()对模型进行训练和测试。

fold = 0
for train, test in skf.split(X, Y):
    fold += 1
    print(f"Fold #{fold}\n")
    
    x_train = X[train]
    y_train = Y[train]
    
    x_test = X[test]
    y_test = Y[test]
    
    
    cbs = [tf.keras.callbacks.EarlyStopping(patience=3)]
    
    clf = ak.TextClassifier(overwrite=True, max_trials=2)
    
    
    # The line where it indicated the error.
    clf.fit(x_train, y_train, epochs=3, callbacks=cbs)
    
    
    pred = clf.predict(x_test) # result data type is in lists of `string`
    
    ceval = clf.evaluate(x_test, y_test)
    
    metrics_test = metrics.classification_report(y_test, np.array(list(pred), dtype=int))
    
    print(metrics_test, "\n")
    
    print(f"Fold #{fold} finished\n")

补充

我通过Google Colab共享与错误相关的完整代码，您可以通过诊断在这里来帮助我。

编辑注释

我已尝试过可能的解决办法，例如：

x_train = np.asarray(x_train).astype(np.float32)
y_train = np.asarray(y_train).astype(np.float32)

或

x_train = tf.data.Dataset.from_tensor_slices((x_train,))
y_train = tf.data.Dataset.from_tensor_slices((y_train,))

然而，问题仍然存在。

pandas

numpy

tensorflow

keras

auto-keras

回答 1

Stack Overflow用户

回答已采纳

发布于 2022-05-06 03:44:31

其中一个字符串等于nan。只需删除此条目和相应的标签即可。

票数 0

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/72140192

复制

相似问题

社区富文本编辑器全新改版！诚邀体验～

全新交互，全新视觉，新增快捷键、悬浮工具栏、高亮块等功能并同时优化现有功能，全面提升创作效率和体验

问基于NumPy的自动Keras训练中的StratifiedKFold数组值错误
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问基于NumPy的自动Keras训练中的StratifiedKFold数组值错误EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问基于NumPy的自动Keras训练中的StratifiedKFold数组值错误
EN