如何使用Python和Pandas在时间翻转时添加24小时

2024-04-26

我必须分析一些日志,并基于此,我必须计算一些东西,但我坚持了一件事。 在这里,我尝试以简单的形式重新创建我的问题。 假设我有以下日志“堆栈溢出.txt" file

23:58:03.458
23:58:13.446
23:58:23.447
23:58:33.440
23:58:43.440
23:58:53.440
23:59:03.434
23:59:13.435
23:59:23.428
23:59:33.428
23:59:43.429
23:59:53.435
00:00:03.429
00:00:13.423
00:00:23.417
00:00:33.411
00:00:43.418
00:00:53.411
00:01:03.405
00:01:13.406
00:01:23.400
00:01:33.406
00:01:43.400
00:01:53.411
00:02:03.400
00:02:13.406
00:02:23.394
00:02:33.400
00:02:43.394

我使用了下面的Python程序,将这个时间转换成毫秒。

import pandas as pd
df = pd.read_csv("stackoverflow.txt", header=None)
# Split Time String into Hour Minutes Seconds and Milliseconds
new_df = df[0].str.split(":", n=-1, expand=True)
df['Hours'] = new_df[0]
df['Minutes'] = new_df[1]
# Split Seconds.Milliseconds information into Seconds and Milliseconds separately
new_df = new_df[2].str.split(".", n=-1, expand=True)
df['Seconds'] = new_df[0]
df['Milliseconds'] = new_df[1]
# These generated data frames are string, convert them into Integers
# df['Hours'] = df['Hours'].apply(lambda x: int(x,10)) 
# Another way of doing, good thing is that both are consuming same amount of time, checked using %time
df['Hours'] = pd.to_numeric(df['Hours'], errors='coerce')
df['Minutes'] = pd.to_numeric(df['Minutes'], errors='coerce')
df['Seconds'] = pd.to_numeric(df['Seconds'], errors='coerce')
df['Milliseconds'] = pd.to_numeric(df['Milliseconds'], errors='coerce')
# Calculate Total Time
df['Total Time(ms)'] = df['Hours']*3600000 + df['Minutes']*60000 + df['Seconds']*1000 + df['Milliseconds']
df

输出如下:

0   Hours   Minutes Seconds Milliseconds    Total Time(ms)
0   23:58:03.458    23  58  3   458 86283458
1   23:58:13.446    23  58  13  446 86293446
2   23:58:23.447    23  58  23  447 86303447
3   23:58:33.440    23  58  33  440 86313440
4   23:58:43.440    23  58  43  440 86323440
5   23:58:53.440    23  58  53  440 86333440
6   23:59:03.434    23  59  3   434 86343434
7   23:59:13.435    23  59  13  435 86353435
8   23:59:23.428    23  59  23  428 86363428
9   23:59:33.428    23  59  33  428 86373428
10  23:59:43.429    23  59  43  429 86383429
11  23:59:53.435    23  59  53  435 86393435
12  00:00:03.429    0   0   3   429 3429
13  00:00:13.423    0   0   13  423 13423
14  00:00:23.417    0   0   23  417 23417
15  00:00:33.411    0   0   33  411 33411
16  00:00:43.418    0   0   43  418 43418
17  00:00:53.411    0   0   53  411 53411
18  00:01:03.405    0   1   3   405 63405
19  00:01:13.406    0   1   13  406 73406
20  00:01:23.400    0   1   23  400 83400
21  00:01:33.406    0   1   33  406 93406
22  00:01:43.400    0   1   43  400 103400
23  00:01:53.411    0   1   53  411 113411
24  00:02:03.400    0   2   3   400 123400
25  00:02:13.406    0   2   13  406 133406
26  00:02:23.394    0   2   23  394 143394
27  00:02:33.400    0   2   33  400 153400
28  00:02:43.394    0   2   43  394 163394

但每当一天从 23:59 到 00:00 发生变化时,我想添加 24 小时。 我无法理解,我将如何做到这一点。 有人可以帮助我实现这一目标吗?


我建议与Timedelta http://pandas.pydata.org/pandas-docs/stable/user_guide/timedeltas.htmls:

df = pd.read_csv("stackoverflow.txt", header=None)

首先将列转换为to_timedelta http://pandas.pydata.org/pandas-docs/stable/generated/pandas.to_timedelta.html,然后求差值,比较Timedelta(0)并为下一行添加pd.Timedelta(24, 'h').

td = pd.to_timedelta(df[0])
df['new'] = td.mask(td.diff().lt(pd.Timedelta(0)).cumsum().gt(0), td + pd.Timedelta(1, 'days'))
df['newint'] = (df['new'].dt.total_seconds() * 1000).astype(int)

print (df)
               0                    new    newint
0   23:58:03.458 0 days 23:58:03.458000  86283458
1   23:58:13.446 0 days 23:58:13.446000  86293446
2   23:58:23.447 0 days 23:58:23.447000  86303447
3   23:58:33.440 0 days 23:58:33.440000  86313440
4   23:58:43.440 0 days 23:58:43.440000  86323440
5   23:58:53.440 0 days 23:58:53.440000  86333440
6   23:59:03.434 0 days 23:59:03.434000  86343434
7   23:59:13.435 0 days 23:59:13.435000  86353435
8   23:59:23.428 0 days 23:59:23.428000  86363428
9   23:59:33.428 0 days 23:59:33.428000  86373428
10  23:59:43.429 0 days 23:59:43.429000  86383429
11  23:59:53.435 0 days 23:59:53.435000  86393435
12  00:00:03.429 1 days 00:00:03.429000  86403429
13  00:00:13.423 1 days 00:00:13.423000  86413423
14  00:00:23.417 1 days 00:00:23.417000  86423417
15  00:00:33.411 1 days 00:00:33.411000  86433411
16  00:00:43.418 1 days 00:00:43.418000  86443418
17  00:00:53.411 1 days 00:00:53.411000  86453411
18  00:01:03.405 1 days 00:01:03.405000  86463405
19  00:01:13.406 1 days 00:01:13.406000  86473406
20  00:01:23.400 1 days 00:01:23.400000  86483400
21  00:01:33.406 1 days 00:01:33.406000  86493406
22  00:01:43.400 1 days 00:01:43.400000  86503400
23  00:01:53.411 1 days 00:01:53.411000  86513411
24  00:02:03.400 1 days 00:02:03.400000  86523400
25  00:02:13.406 1 days 00:02:13.406000  86533406
26  00:02:23.394 1 days 00:02:23.394000  86543394
27  00:02:33.400 1 days 00:02:33.400000  86553400
28  00:02:43.394 1 days 00:02:43.394000  86563394

解决方案是数据多天 - 因此对于第一次更改添加 1 天,接下来的 2 天......

创建差异,添加累积和并将输出转换为日时间增量,即添加到原始数据中的内容:

print (df)
               0
0   23:59:23.428
1   23:59:33.428
2   23:59:43.429
3   23:59:53.435
4   00:00:03.429
5   00:00:13.423
6   00:00:23.417
7   00:00:33.411
8   23:59:23.428
9   23:59:33.428
10  23:59:43.429
11  23:59:53.435
12  00:00:03.429
13  00:00:13.423
14  00:00:23.417
15  00:00:33.411

td = pd.to_timedelta(df[0])
days = pd.to_timedelta(td.diff().lt(pd.Timedelta(0)).cumsum(), unit='d')

df['new'] = td + days
df['newint'] = (df['new'].dt.total_seconds() * 1000).astype(int)
print (df)
               0                    new     newint
0   23:59:23.428 0 days 23:59:23.428000   86363428
1   23:59:33.428 0 days 23:59:33.428000   86373428
2   23:59:43.429 0 days 23:59:43.429000   86383429
3   23:59:53.435 0 days 23:59:53.435000   86393435
4   00:00:03.429 1 days 00:00:03.429000   86403429
5   00:00:13.423 1 days 00:00:13.423000   86413423
6   00:00:23.417 1 days 00:00:23.417000   86423417
7   00:00:33.411 1 days 00:00:33.411000   86433411
8   23:59:23.428 1 days 23:59:23.428000  172763428
9   23:59:33.428 1 days 23:59:33.428000  172773428
10  23:59:43.429 1 days 23:59:43.429000  172783429
11  23:59:53.435 1 days 23:59:53.435000  172793435
12  00:00:03.429 2 days 00:00:03.429000  172803429
13  00:00:13.423 2 days 00:00:13.423000  172813423
14  00:00:23.417 2 days 00:00:23.417000  172823417
15  00:00:33.411 2 days 00:00:33.411000  172833411

EDIT:

天数解释:

首先得到差异diff http://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.diff.html:

print (td.diff())
0                         NaT
1                    00:00:10
2             00:00:10.001000
3             00:00:10.006000
4    -1 days +00:00:09.994000
5             00:00:09.994000
6             00:00:09.994000
7             00:00:09.994000
8             23:58:50.017000
9                    00:00:10
10            00:00:10.001000
11            00:00:10.006000
12   -1 days +00:00:09.994000
13            00:00:09.994000
14            00:00:09.994000
15            00:00:09.994000
Name: 0, dtype: timedelta64[ns]

然后比较通过lt http://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.lt.html (<) 对于负 Timedeltas:

print (td.diff().lt(pd.Timedelta(0)))
0     False
1     False
2     False
3     False
4      True
5     False
6     False
7     False
8     False
9     False
10    False
11    False
12     True
13    False
14    False
15    False
Name: 0, dtype: bool

获取累计总和cumsum http://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.cumsum.html:

print (td.diff().lt(pd.Timedelta(0)).cumsum())
0     0
1     0
2     0
3     0
4     1
5     1
6     1
7     1
8     1
9     1
10    1
11    1
12    2
13    2
14    2
15    2
Name: 0, dtype: int32

最后转换为天数 timedeltas:

days = pd.to_timedelta(td.diff().lt(pd.Timedelta(0)).cumsum(), unit='d')

print (days)
0    0 days
1    0 days
2    0 days
3    0 days
4    1 days
5    1 days
6    1 days
7    1 days
8    1 days
9    1 days
10   1 days
11   1 days
12   2 days
13   2 days
14   2 days
15   2 days
Name: 0, dtype: timedelta64[ns]

EDIT:

您的解决方案中可以使用相同的IDE:

...
df['Total Time(ms)'] = df['Hours']*3600000 + df['Minutes']*60000 + 
                       df['Seconds']*1000 + df['Milliseconds']

s = df['Total Time(ms)'].diff().lt(0).cumsum() * 24 * 60 * 60 * 1000
df['newint'] = s + df['Total Time(ms)']

print (df)
               0  Hours  Minutes  Seconds  Milliseconds  Total Time(ms)  \
0   23:59:23.428     23       59       23           428        86363428   
1   23:59:33.428     23       59       33           428        86373428   
2   23:59:43.429     23       59       43           429        86383429   
3   23:59:53.435     23       59       53           435        86393435   
4   00:00:03.429      0        0        3           429            3429   
5   00:00:13.423      0        0       13           423           13423   
6   00:00:23.417      0        0       23           417           23417   
7   00:00:33.411      0        0       33           411           33411   
8   23:59:23.428     23       59       23           428        86363428   
9   23:59:33.428     23       59       33           428        86373428   
10  23:59:43.429     23       59       43           429        86383429   
11  23:59:53.435     23       59       53           435        86393435   
12  00:00:03.429      0        0        3           429            3429   
13  00:00:13.423      0        0       13           423           13423   
14  00:00:23.417      0        0       23           417           23417   
15  00:00:33.411      0        0       33           411           33411   

       newint  
0    86363428  
1    86373428  
2    86383429  
3    86393435  
4    86403429  
5    86413423  
6    86423417  
7    86433411  
8   172763428  
9   172773428  
10  172783429  
11  172793435  
12  172803429  
13  172813423  
14  172823417  
15  172833411 
本文内容由网友自发贡献,版权归原作者所有,本站不承担相应法律责任。如您发现有涉嫌抄袭侵权的内容,请联系:hwhale#tublm.com(使用前将#替换为@)

如何使用Python和Pandas在时间翻转时添加24小时 的相关文章

随机推荐