我有这个数据框:
ID Date X 123_Var 456_Var 789_Var
A 16-07-19 3 777 250 810
A 17-07-19 9 637 121 529
A 20-07-19 2 295 272 490
A 21-07-19 3 778 600 544
A 22-07-19 6 741 792 907
A 25-07-19 6 435 416 820
A 26-07-19 8 590 455 342
A 27-07-19 6 763 476 753
A 02-08-19 6 717 211 454
A 03-08-19 6 152 442 475
A 05-08-19 6 564 340 302
A 07-08-19 6 105 929 633
A 08-08-19 6 948 366 586
B 07-08-19 4 509 690 406
B 08-08-19 2 413 725 414
B 12-08-19 2 170 702 912
B 13-08-19 3 851 616 477
B 14-08-19 9 475 447 555
B 15-08-19 1 412 403 708
B 17-08-19 2 299 537 321
B 18-08-19 4 310 119 125
我想展示mean
的价值n
最后几天(使用Date
柱子),不包括当天的值.
我正在使用这段代码(我应该做什么来解决这个问题?):
n = 4
cols = list(df.filter(regex='Var').columns)
df = df.set_index('Date')
df[cols] = (df.groupby('ID').rolling(window=f'{n}D')[cols].mean()
.reset_index(0,drop=True).add_suffix(f'_{n}'))
df.reset_index(inplace=True)
预期结果:
ID Date X 123_Var 456_Var 789_Var 123_Var_4 456_Var_4 789_Var_4
A 16-07-19 3 777 250 810 NaN NaN NaN
A 17-07-19 9 637 121 529 777.000000 250.000000 810.0
A 20-07-19 2 295 272 490 707.000000 185.500000 669.5
A 21-07-19 3 778 600 544 466.000000 196.500000 509.5
A 22-07-19 6 741 792 907 536.500000 436.000000 517.0
A 25-07-19 6 435 416 820 759.500000 696.000000 725.5
A 26-07-19 8 590 455 342 588.000000 604.000000 863.5
A 27-07-19 6 763 476 753 512.500000 435.500000 581.0
A 02-08-19 6 717 211 454 NaN NaN NaN
A 03-08-19 6 152 442 475 717.000000 211.000000 454.0
A 05-08-19 6 564 340 302 434.500000 326.500000 464.5
A 07-08-19 6 105 929 633 358.000000 391.000000 388.5
A 08-08-19 6 948 366 586 334.500000 634.500000 467.5
B 07-08-19 4 509 690 406 NaN NaN NaN
B 08-08-19 2 413 725 414 509.000000 690.000000 406.0
B 12-08-19 2 170 702 912 413.000000 725.000000 414.0
B 13-08-19 3 851 616 477 291.500000 713.500000 663.0
B 14-08-19 9 475 447 555 510.500000 659.000000 694.5
B 15-08-19 1 412 403 708 498.666667 588.333333 648.0
B 17-08-19 2 299 537 321 579.333333 488.666667 580.0
B 18-08-19 4 310 119 125 395.333333 462.333333 528.0
我改变unutbu solution工作于rolling
:
df['Date'] = pd.to_datetime(df['Date'], dayfirst=True)
n = 5
cols = df.filter(regex='Var').columns
df = df.set_index('Date')
df_ = df.set_index('ID', append=True).swaplevel(1,0)
df1 = df.groupby('ID').rolling(window=f'{n}D')[cols].count()
df2 = df.groupby('ID').rolling(window=f'{n}D')[cols].mean()
df3 = (df1.mul(df2)
.sub(df_[cols])
.div(df1[cols].sub(1)).add_suffix(f'_{n}')
)
df4 = df_.join(df3)
print (df4)
X 123_Var 456_Var 789_Var 123_Var_5 456_Var_5 789_Var_5
ID Date
A 2019-07-16 3 777 250 810 NaN NaN NaN
2019-07-17 9 637 121 529 777.000000 250.000000 810.0
2019-07-20 2 295 272 490 707.000000 185.500000 669.5
2019-07-21 3 778 600 544 466.000000 196.500000 509.5
2019-07-22 6 741 792 907 536.500000 436.000000 517.0
2019-07-25 6 435 416 820 759.500000 696.000000 725.5
2019-07-26 8 590 455 342 588.000000 604.000000 863.5
2019-07-27 6 763 476 753 512.500000 435.500000 581.0
2019-08-02 6 717 211 454 NaN NaN NaN
2019-08-03 6 152 442 475 717.000000 211.000000 454.0
2019-08-05 6 564 340 302 434.500000 326.500000 464.5
2019-08-07 6 105 929 633 358.000000 391.000000 388.5
2019-08-08 6 948 366 586 334.500000 634.500000 467.5
B 2019-08-07 4 509 690 406 NaN NaN NaN
2019-08-08 2 413 725 414 509.000000 690.000000 406.0
2019-08-12 2 170 702 912 413.000000 725.000000 414.0
2019-08-13 3 851 616 477 170.000000 702.000000 912.0
2019-08-14 9 475 447 555 510.500000 659.000000 694.5
2019-08-15 1 412 403 708 498.666667 588.333333 648.0
2019-08-17 2 299 537 321 579.333333 488.666667 580.0
2019-08-18 4 310 119 125 395.333333 462.333333 528.0
本文内容由网友自发贡献,版权归原作者所有,本站不承担相应法律责任。如您发现有涉嫌抄袭侵权的内容,请联系:hwhale#tublm.com(使用前将#替换为@)