我想要创建一个类似于此图像的图,以便比较我的数据集的多个暗度。数据集不是预设的。我设法用一种颜色正确地显示了数据,但是我想要一种颜色用于y=0,一种用于y=1来比较这些点。就像在虹膜数据集的图像中一样。一旦我在sns.pairplot
方法中包含了hue='y'
,代码直到最后才会编译。
我也不理解控制台的输出。问题出在哪里?
import seaborn as sns; sns.set(style="ticks", color\_codes=True) import pandas as pd
dataframe = pd.DataFrame(dict(F1=X[:, 0], F2=X[:, 1], F3=X[:, 2], F4=X[:, 3], y=y))
print(dataframe)
g = sns.pairplot(dataframe, hue='y')
这是dataframe
的输出。在我看来没问题:
F1 F2 F3 F4 y
0 3.173182 2.849991 2.497907 2.851715 0.0
1 2.468625 -0.216985 0.275206 1.232518 1.0
2 2.398419 2.258931 2.255533 4.895872 0.0
3 1.379937 1.041677 1.165911 1.992650 1.0
4 2.489665 2.269068 4.129961 2.218203 0.0
5 4.140160 2.809088 2.973027 3.553128 0.0
6 2.997969 1.701299 2.978875 1.946793 0.0
7 3.864436 3.554276 3.568455 2.839489 0.0
8 -0.000605 1.376971 1.128350 1.293777 1.0
9 2.398057 1.180861 2.400801 2.264726 1.0
10 0.997385 -0.560205 0.954628 2.788858 1.0
... ... ... ... ... ...
3990 3.334553 4.576306 2.470476 3.032781 0.0
3991 1.465784 2.304793 1.267303 -0.030802 1.0
3992 0.505905 -0.280769 -1.223464 1.077305 1.0
3993 2.581596 3.924394 3.878303 2.579366 0.0
3994 4.362067 2.247818 2.948595 1.906314 0.0
3995 2.310546 0.006672 2.382227 1.940343 1.0
3996 -0.944635 1.387136 0.604135 2.421478 1.0
3997 1.290999 1.485965 0.262792 0.899340 1.0
3998 0.864532 1.759607 1.118346 1.038935 1.0
3999 1.819110 2.218838 3.927945 2.593009 0.0
[4000 rows x 5 columns]
但最终我收到了这个错误:
Traceback (most recent call last):
File "/Users//PycharmProjects//V3_multiTops/vergleich.py", line 131, in <module>
g = sns.pairplot(dataframe, hue='y')
File "/Users//PycharmProjects//venv/lib/python3.7/site-packages/seaborn/axisgrid.py", line 2111, in pairplot
grid.map_diag(kdeplot, **diag_kws)
File "/Users//PycharmProjects//venv/lib/python3.7/site-packages/seaborn/axisgrid.py", line 1399, in map_diag
func(data_k, label=label_k, color=color, **kwargs)
File "/Users//PycharmProjects//venv/lib/python3.7/site-packages/seaborn/distributions.py", line 691, in kdeplot
cumulative=cumulative, **kwargs)
File "/Users//PycharmProjects//venv/lib/python3.7/site-packages/seaborn/distributions.py", line 294, in _univariate_kdeplot
x, y = _scipy_univariate_kde(data, bw, gridsize, cut, clip)
File "/Users//PycharmProjects//venv/lib/python3.7/site-packages/seaborn/distributions.py", line 366, in _scipy_univariate_kde
kde = stats.gaussian_kde(data, bw_method=bw)
File "/Users//PycharmProjects//venv/lib/python3.7/site-packages/scipy/stats/kde.py", line 172, in __init__
self.set_bandwidth(bw_method=bw_method)
File "/Users//PycharmProjects//venv/lib/python3.7/site-packages/scipy/stats/kde.py", line 499, in set_bandwidth
self._compute_covariance()
File "/Users//PycharmProjects//venv/lib/python3.7/site-packages/scipy/stats/kde.py", line 510, in _compute_covariance
self._data_inv_cov = linalg.inv(self._data_covariance)
File "/Users//PycharmProjects//venv/lib/python3.7/site-packages/scipy/linalg/basic.py", line 975, in inv
raise LinAlgError("singular matrix")
numpy.linalg.linalg.LinAlgError: singular matrix
我想我在sns.pairplot()
上做了一些错误的事情,我还不明白。你能给我解释一下吗?
发布于 2019-01-22 23:37:01
问题似乎是"y"
列本身是数字的。因此,它将作为列/行包含在配对网格中。不管怎么说,这似乎并不是我们想要的。要选择将参与网格的变量,请使用pairplot
的vars
关键字。
sns.pairplot(df, vars=df.columns[:-1], hue="y")
iris
dataset在不指定vars
的情况下工作的原因是hue
列不是数字。非数字列不包括在网格中。
完整示例:
import seaborn as sns
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
df = pd.DataFrame(np.random.randn(300, 4), columns=[f"F{i+1}" for i in range(4)])
df["y"] = np.random.choice([1., 0.], size=len(df))
sns.pairplot(df, vars=df.columns[:-1], hue="y")
plt.show()
https://stackoverflow.com/questions/54317168
复制