我需要根据条件向pandas dataframe列添加一个新的键值对。目标列数据为字典格式。因此,如果条件为真,则必须创建对,否则,不需要任何操作。我正试着通过np.where:
df = pd.DataFrame({"amenity": ["1","2","3","4"], "tags": [{"building":"yes"},{"entrance": "yes"},{},{}], "sport": [None, "hockey", "football", None], "leisure":["multi", "some", "field", "wake"]})
leisure_var_add = ["field", "multi"]
df['tags']['sport'] = np.where((df['sport'] != None) | (df['leisure'].isin(leisure_var_add))), df['sport'], None)
df['tags']['leisure'] = np.where((df['sport'] == None) & (df['leisure'] !=None) & (~df['leisure'].isin(leisure_var_add)), df['leisure'], None)
我想得到这样的东西:
amenity tags sport leisure
0 1 {'building':'yes','sport': 'multi'} None multi
1 2 {'entrance': 'yes','sport': 'hockey'} hokkey some
2 3 {'sport': 'football', 'leisure': 'field'} football field
3 4 {'leisure': 'wake'} None wake
我已经使用循环遍历每一行并使用索引操作实现了这个任务,但在本例中,我失去了Pandas的所有好处。你知道如何实现它吗?
发布于 2021-10-22 04:46:57
使用理解:
df['tags'] = df[['sport', 'leisure']] \
.apply(lambda x: {k: v for k, v in x[x.notna()].items()}, axis=1)
输出:
>>> df
amenity tags sport leisure
0 1 {'leisure': 'multi'} None multi
1 2 {'sport': 'hokkey', 'leisure': 'some'} hokkey some
2 3 {'sport': 'football', 'leisure': 'field'} football field
3 4 {'leisure': 'wake'} None wake
发布于 2021-10-22 12:15:54
我使用apply将所有数据移动到列,然后迭代各行,使用列数据构建标记字典(不包括便利性
df = pd.DataFrame({"amenity": ["1","2","3","4"], "tags": [{"building":"yes"},{"entrance": "yes"},{},{}], "sport": [None, "hockey", "football", None], "leisure":["multi", "some", "field", "wake"]})
def EmptyList(x):
if len(x)>0:
return x[0]
else:
return None
df['building']=df['tags'].apply(lambda x: [v for k,v in x.items() if k=='building']).apply(EmptyList)
df['entrance']=df['tags'].apply(lambda x: [v for k,v in x.items() if k=='entrance']).apply(EmptyList)
df.drop(['tags'],inplace=True,axis=1)
print(df)
tags_dict={}
columns=df.columns
for key,value in df.iterrows():
for column in columns:
if value[column]!=None and column != 'amenity':
#print(value[column])
tags_dict[column]=value[column]
#print(tags_dict)
df.loc[key,'tags']=str(tags_dict)
tags_dict.clear()
print(df)
输出
amenity sport leisure building entrance \
0 1 None multi yes None
1 2 hockey some None yes
2 3 football field None None
3 4 None wake None None
tags
0 {'leisure': 'multi', 'building': 'yes'}
1 {'sport': 'hockey', 'leisure': 'some', 'entran...
2 {'sport': 'football', 'leisure': 'field'}
3 {'leisure': 'wake'}
https://stackoverflow.com/questions/69676992
复制相似问题