一、描述问题
如何取到下面这个dataframe中,每一类Code对应的月末数据
df = pd.DataFrame({
"Code":['a', 'a', 'a', 'a', 'b', 'b', 'b', 'c', 'c', 'd'],
"Time":['20180101', '20180130', '20180201', '20180228', '20180302', '20180401', '20180430', '20180502', '20180530', '20180629'],
'X':np.arange(0,20,2),
'Y':np.arange(1,20,2)})
df
Out[6]:
Code Time X Y
0 a 20180101 0 1
1 a 20180130 2 3
2 a 20180201 4 5
3 a 20180228 6 7
4 b 20180302 8 9
5 b 20180401 10 11
6 b 20180430 12 13
7 c 20180502 14 15
8 c 20180530 16 17
9 d 20180629 18 19
即我们想要得到下面的这种效果
Code Time X Y
0 a 201801 2 3
1 a 201802 6 7
2 b 201803 8 9
3 b 201804 12 13
4 c 201805 16 17
5 d 201806 18 19
二、解决问题
df["Time"] = pd.to_datetime(df["Time"]).dt.strftime("%Y%m").astype(np.int64)
df.groupby(["Code", "Time"]).agg('last')
Out[9]:
X Y
Code Time
a 201801 2 3
201802 6 7
b 201803 8 9
201804 12 13
c 201805 16 17
d 201806 18 19
df.groupby(["Code", "Time"]).agg('last').reset_index()
Out[11]:
Code Time X Y
0 a 201801 2 3
1 a 201802 6 7
2 b 201803 8 9
3 b 201804 12 13
4 c 201805 16 17
5 d 201806 18 19
三、注意:如果不用分每类A,且要保持原时间2018-01-01
可以用
df = pd.DataFrame({
"Time":['20180101', '20180130', '20180201', '20180228', '20180302', '20180401', '20180430', '20180502', '20180530', '20180629'],
'X':np.arange(0,20,2),
'Y':np.arange(1,20,2)})
df["Time"] = pd.to_datetime(df["Time"])
df.set_index("Time")
Out[16]:
X Y
Time
2018-01-01 0 1
2018-01-30 2 3
2018-02-01 4 5
2018-02-28 6 7
2018-03-02 8 9
2018-04-01 10 11
2018-04-30 12 13
2018-05-02 14 15
2018-05-30 16 17
2018-06-29 18 19
df.set_index("Time", inplace=True)
df.loc[df.groupby(df.index.to_period('M')).apply(lambda x: x.index.max())]
Out[19]:
X Y
Time
2018-01-30 2 3
2018-02-28 6 7
2018-03-02 8 9
2018-04-30 12 13
2018-05-30 16 17
2018-06-29 18 19
参考:
《get first and last values in a groupby》
《pandas获取月底最后一个交易日对应数据》
本文内容由网友自发贡献,版权归原作者所有,本站不承担相应法律责任。如您发现有涉嫌抄袭侵权的内容,请联系:hwhale#tublm.com(使用前将#替换为@)