我有一个如下所示的数据框
df = pd.DataFrame({
'subject_id':[1,1,1,1,1,1],
'time_1' :['2173-04-03 10:00:00','2173-04-03 10:15:00','2173-04-03
10:30:00','2173-04-03 10:45:00','2173-04-03 11:05:00','2173-
04-03 11:15:00'],
'val' :[5,6,5,6,6,6]
})
我想找到按顺序出现的值的总持续时间。下面的例子将帮助您理解
从上面的截图可以看出6
依次发生从10:45
to 23:59
而其他值(尽管可以是任何实时值)根本不按顺序排列。
我做了这样的事情但没有给出预期的输出。它汇总了所有值
df['time_1'] = pd.to_datetime(df['time_1'])
df['seq'] = df['val'] == df['val'].shift(-1)
s=pd.to_timedelta(24,unit='h')-(df.time_1-df.time_1.dt.normalize())
df['tdiff'] =df.groupby(df.time_1.dt.date).time_1.diff().shift(-1).fillna(s).dt.total_seconds()/3600
df.groupby([df['seq'] == True])['tdiff'].cumsum() # do cumulative sum only when the values are in sequence
如何根据条件对组进行累计求和?
我希望我的输出如下所示。你看13:15
因为我们在接下来的数据中看不到任何其他价值13:15
从第一次出现开始的小时6
这是在10:45
(24:00 hr - 10:45
gives 13:15
)
测试数据框
df = pd.DataFrame({
'subject_id':[1,1,1,1,1,1,1,1,1,1,1],
'time_1' :['2173-04-03 12:35:00','2173-04-03 12:50:00','2173-04-03
12:59:00','2173-04-03 13:14:00','2173-04-03 13:37:00','2173-04-04
11:30:00','2173-04-05 16:00:00','2173-04-05 22:00:00','2173-04-06
04:00:00','2173-04-06 04:30:00','2173-04-06 08:00:00'],
'val' :[5,5,5,5,10,5,5,8,3,4,6]
})