假设我有5列。
pd.DataFrame({
'Column1': [1, 2, 3, 4, 5, 6, 7, 8, 9],
'Column2': [4, 3, 6, 8, 3, 4, 1, 4, 3],
'Column3': [7, 3, 3, 1, 2, 2, 3, 2, 7],
'Column4': [9, 8, 7, 6, 5, 4, 3, 2, 1],
'Column5': [1, 1, 1, 1, 1, 1, 1, 1, 1]})
是否有一个函数可以知道每一列之间的关系类型?(一对一,一对多,多对一,多对多)
类似于:
Column1 Column2 one-to-many
Column1 Column3 one-to-many
Column1 Column4 one-to-one
Column1 Column5 one-to-many
Column2 Column3 many-to-many
...
Column4 Column5 one-to-many
发布于 2019-11-28 14:56:50
这应该适用于你:
df = pd.DataFrame({
'Column1': [1, 2, 3, 4, 5, 6, 7, 8, 9],
'Column2': [4, 3, 6, 8, 3, 4, 1, 4, 3],
'Column3': [7, 3, 3, 1, 2, 2, 3, 2, 7],
'Column4': [9, 8, 7, 6, 5, 4, 3, 2, 1],
'Column5': [1, 1, 1, 1, 1, 1, 1, 1, 1]})
def get_relation(df, col1, col2):
first_max = df[[col1, col2]].groupby(col1).count().max()[0]
second_max = df[[col1, col2]].groupby(col2).count().max()[0]
if first_max==1:
if second_max==1:
return 'one-to-one'
else:
return 'one-to-many'
else:
if second_max==1:
return 'many-to-one'
else:
return 'many-to-many'
from itertools import product
for col_i, col_j in product(df.columns, df.columns):
if col_i == col_j:
continue
print(col_i, col_j, get_relation(df, col_i, col_j))
产出:
Column1 Column2 one-to-many
Column1 Column3 one-to-many
Column1 Column4 one-to-one
Column1 Column5 one-to-many
Column2 Column1 many-to-one
Column2 Column3 many-to-many
Column2 Column4 many-to-one
Column2 Column5 many-to-many
Column3 Column1 many-to-one
Column3 Column2 many-to-many
Column3 Column4 many-to-one
Column3 Column5 many-to-many
Column4 Column1 one-to-one
Column4 Column2 one-to-many
Column4 Column3 one-to-many
Column4 Column5 one-to-many
Column5 Column1 many-to-one
Column5 Column2 many-to-many
Column5 Column3 many-to-many
Column5 Column4 many-to-one
发布于 2019-11-28 14:54:24
这可能不是一个完美的答案,但它应该经过进一步的修改才能奏效:
a = df.nunique()
is9, is1 = a==9, a==1
one_one = is9[:, None] & is9
one_many = is1[:, None]
many_one = is1[None, :]
many_many = (~is9[:,None]) & (~is9)
pd.DataFrame(np.select([one_one, one_many, many_one],
['one-to-one', 'one-to-many', 'many-to-one'],
'many-to-many'),
df.columns, df.columns)
输出:
Column1 Column2 Column3 Column4 Column5
Column1 one-to-one many-to-many many-to-many one-to-one many-to-one
Column2 many-to-many many-to-many many-to-many many-to-many many-to-one
Column3 many-to-many many-to-many many-to-many many-to-many many-to-one
Column4 one-to-one many-to-many many-to-many one-to-one many-to-one
Column5 one-to-many one-to-many one-to-many one-to-many one-to-many
发布于 2019-11-28 14:57:00
首先,我们得到列与itertools.product
的所有组合
最后,我们使用pd.merge
和validate
参数来检查哪些关系“通过”了try, except
的测试。
注意,我们省略了many_to_many
,因为这种关系没有被“选中”,引用于docs:
“many_to_many”或“m:m”:允许,但不会导致检查。
from itertools import product
def check_cardinality(df):
combinations_lst = list(product(df.columns, df.columns))
relations = ['one_to_one', 'one_to_many', 'many_to_one']
output = []
for col1, col2 in combinations_lst:
for relation in relations:
try:
pd.merge(df[[col1]], df[[col2]], left_on=col1, right_on=col2, validate=relation)
output.append([col1, col2, relation])
except:
continue
return output
cardinality = (pd.DataFrame(check_cardinality(df), columns=['first_column', 'second_column', 'cardinality'])
.drop_duplicates(['first_column', 'second_column'])
.reset_index(drop=True))
输出
first_column second_column cardinality
0 Column1 Column1 one_to_one
1 Column1 Column2 one_to_many
2 Column1 Column3 one_to_many
3 Column1 Column4 one_to_one
4 Column1 Column5 one_to_many
5 Column2 Column1 many_to_one
6 Column2 Column4 many_to_one
7 Column3 Column1 many_to_one
8 Column3 Column4 many_to_one
9 Column4 Column1 one_to_one
10 Column4 Column2 one_to_many
11 Column4 Column3 one_to_many
12 Column4 Column4 one_to_one
13 Column4 Column5 one_to_many
14 Column5 Column1 many_to_one
15 Column5 Column4 many_to_one
https://stackoverflow.com/questions/59091196
复制相似问题