我有以下熊猫数据帧,在进行每日重采样和取平均值之前,最终取30天的滚动平均值:
import pandas as pd
df = pd.DataFrame()
df.index = ['2009-01-04', '2009-01-05', '2009-01-05', '2009-01-06', '2009-01-06', '2009-01-07', '2009-01-07', '2009-01-07']
df['score1'] = [84, 28, 38, 48, 23, 38, 22, 37]
df['score2'] = [83, 43, 12, 93, 64, 28, 29, 12]
df['score3'] = [92, 33, 11, 48, 23, 22, 12, 38]
df['score4'] = [43, 23, 41, 75, 93, 93, 23, 21]
df['condition1'] = [0, 0, 1, 0, 1, 0, 1, 0]
df['condition2'] = [1, 0, 1, 0, 0, 0, 0, 1]
df['condition3'] = [0, 0, 0, 1, 1, 0, 0, 1]
df = df.resample('D', how='mean')
df = df.rolling(30, min_periods=1).mean()
在这种情况下,我想做一个“条件平均”--即。当3个条件中的一个为==1时,只有带有1的“行”将计算其平均值。例如,时间3,在满足condition1和condition3的情况下,我们只对'2009-01-05‘做了38,12,11和41的平均值,而忽略了28,43,33,23。
发布于 2020-01-26 22:12:35
# convert your index to datetime:
df.index = pd.to_datetime(df.index)
# select the rows that meet condition:
df = df[df.loc[:,['condition1','condition2','condition3']].sum(axis=1)>0]
# resample
df = df.resample('D').mean() # updated syntax
>>> print (df)
score1 score2 score3 score4 condition1 condition2 condition3
2009-01-04 84.0 83.0 92.0 43.0 0.0 1.0 0.0
2009-01-05 38.0 12.0 11.0 41.0 1.0 1.0 0.0
2009-01-06 35.5 78.5 35.5 84.0 0.5 0.0 1.0
2009-01-07 29.5 20.5 25.0 22.0 0.5 0.5 0.5
https://stackoverflow.com/questions/59922108
复制