我有以下数据框:
In [372]: df_2
Out[372]:
A ID3 DATETIME
0 B-028 b76cd912ff 2014-10-08 13:43:27
1 B-054 4a57ed0b02 2014-10-08 14:26:19
2 B-076 1a682034f8 2014-10-08 14:29:01
3 B-023 b76cd912ff 2014-10-08 18:39:34
4 B-023 f88g8d7sds 2014-10-08 18:40:18
5 B-033 b76cd912ff 2014-10-08 18:44:30
6 B-032 b76cd912ff 2014-10-08 18:46:00
7 B-037 b76cd912ff 2014-10-08 18:52:15
8 B-046 db959faf02 2014-10-08 18:59:59
9 B-053 b76cd912ff 2014-10-08 19:17:48
10 B-065 b76cd912ff 2014-10-08 19:21:38
我想找到不同条目之间的差异 - 分组依据'ID3'
.
我正在尝试使用transform()
on a GroupBy
像这样:
In [379]: df_2['diff'] = df_2.sort_values(by='DATETIME').groupby('ID3')['DATETIME'].transform(lambda x: x.diff()); df_2['diff']
Out[379]:
0 NaT
1 NaT
2 NaT
3 1970-01-01 04:56:07
4 NaT
5 1970-01-01 00:04:56
6 1970-01-01 00:01:30
7 1970-01-01 00:06:15
8 NaT
9 1970-01-01 00:25:33
10 1970-01-01 00:03:50
Name: diff, dtype: datetime64[ns]
我也尝试过x.diff().astype(int)
for lambda
,结果完全相同。
两者的数据类型'DATETIME'
and 'diff'
is: datetime64[ns]
我想要实现的是diff
以秒表示,而不是与纪元时间相关的某个时间。
我发现我可以转换df_2['diff']
to TimeDelta
然后在此时在一个链式调用中提取秒数,如下所示:
In [405]: df_2['diff'] = pd.to_timedelta(df_2['diff']).map(lambda x: x.total_seconds()); df_2['diff']
Out[407]:
0 NaN
1 NaN
2 NaN
3 17767.0
4 NaN
5 296.0
6 90.0
7 375.0
8 NaN
9 1533.0
10 230.0
Name: diff, dtype: float64
有没有办法实现这一点(秒作为值df_2['diff']
) 在一步中transform
而不必在此过程中采取几个步骤?
最后,我已经尝试将其转换为TimeDelta
in transform
没有任何成功。
谢谢您的帮助!