我正在尝试用另一个数据框列中的值替换另一个数据框列中的一些值。下面是数据框的外观。df2
有很多行和列。
df1
0 1029
0 aaaaa Green
1 bbbbb Green
2 fffff Blue
3 xxxxx Blue
4 zzzzz Green
df2
0 1 2 3 .... 1029
0 aaaaa 1 NaN 14 NaN
1 bbbbb 1 NaN 14 NaN
2 ccccc 1 NaN 14 Blue
3 ddddd 1 NaN 14 Blue
...
25 yyyyy 1 NaN 14 Blue
26 zzzzz 1 NaN 14 Blue
最终的df应该如下所示
0 1 2 3 .... 1029
0 aaaaa 1 NaN 14 Green
1 bbbbb 1 NaN 14 Green
2 ccccc 1 NaN 14 Blue
3 ddddd 1 NaN 14 Blue
...
25 yyyyy 1 NaN 14 Blue
26 zzzzz 1 NaN 14 Green
因此,基本上需要做的是,需要匹配df1[0]
和df[2]
,然后对于匹配的行,df2[1029]
需要将值替换为df1[1029]
中的相应行。我不想丢失df1['1029']
之外的df2['1029']
中的任何值
我相信python中的re
模块可以做到这一点?这就是我到目前为止所知道的:
import re
for line in replace:
line = re.sub(df1['1029'],
'1029',
line.rstrip())
print(line)
但它肯定不起作用。
我也可以像在merged1 = df1.merge(df2, left_index=True, right_index=True, how='inner')
中一样使用merge,但它不会替换内联的值。
发布于 2018-07-30 07:02:45
您需要:
df1 = pd.DataFrame({'0':['aaaaa','bbbbb','fffff','xxxxx','zzzzz'], '1029':['Green','Green','Blue','Blue','Green']})
df2 = pd.DataFrame({'0':['aaaa','bbbb','ccccc','ddddd','yyyyy','zzzzz',], '1029':[None,None,'Blue','Blue','Blue','Blue']})
# Fill NaNs
df2['1029'] = df2['1029'].fillna(df1['1029'])
# Merge the dataframes
df_ = df2.merge(df1, how='left', on=['0'])
df_['1029'] = np.where(df_['1029_y'].isna(), df_['1029_x'], df_['1029_y'])
df_.drop(['1029_y','1029_x'],1,inplace=True)
print(df_)
输出:
0 1029
0 aaaa Green
1 bbbb Green
2 ccccc Blue
3 ddddd Blue
4 yyyyy Blue
5 zzzzz Green
发布于 2018-07-30 07:00:47
import pandas as pd
import numpy as np
df1 = pd.DataFrame({'0':['aa','bb','ff','xx', 'zz'], '1029':['Green', 'Green', 'Blue', 'Blue', 'Green']})
df2 = pd.DataFrame({'0':['aa','bb','cc','dd','ff','gg','hh','xx','yy', 'zz'], '1': [1]*10, '2': [np.nan]*10, '1029':[np.nan, np.nan, 'Blue', 'Blue', np.nan, np.nan, 'Blue', 'Green', 'Blue', 'Blue']})
df1
0 1029
0 aa Green
1 bb Green
2 ff Blue
3 xx Blue
4 zz Green
df2
0 1 1029 2
0 aa 1 NaN NaN
1 bb 1 NaN NaN
2 cc 1 Blue NaN
3 dd 1 Blue NaN
4 ff 1 NaN NaN
5 gg 1 NaN NaN
6 hh 1 Blue NaN
7 xx 1 Green NaN
8 yy 1 Blue NaN
9 zz 1 Blue NaN
如果两个数据框中的列'0‘已排序,则此操作将起作用。
df2.loc[(df2['1029'].isna() & df2['0'].isin(df1['0'])), '1029'] = df1['1029'][df2['0'].isin(df1['0'])].tolist()
df2
0 1 1029 2
0 aa 1 Green NaN
1 bb 1 Green NaN
2 cc 1 Blue NaN
3 dd 1 Blue NaN
4 ff 1 Green NaN
5 gg 1 NaN NaN
6 hh 1 Blue NaN
7 xx 1 Green NaN
8 yy 1 Blue NaN
9 zz 1 Blue NaN
https://stackoverflow.com/questions/51587685
复制相似问题