我建议与Timedelta http://pandas.pydata.org/pandas-docs/stable/user_guide/timedeltas.htmls:
df = pd.read_csv("stackoverflow.txt", header=None)
首先将列转换为to_timedelta http://pandas.pydata.org/pandas-docs/stable/generated/pandas.to_timedelta.html,然后求差值,比较Timedelta(0)
并为下一行添加pd.Timedelta(24, 'h')
.
td = pd.to_timedelta(df[0])
df['new'] = td.mask(td.diff().lt(pd.Timedelta(0)).cumsum().gt(0), td + pd.Timedelta(1, 'days'))
df['newint'] = (df['new'].dt.total_seconds() * 1000).astype(int)
print (df)
0 new newint
0 23:58:03.458 0 days 23:58:03.458000 86283458
1 23:58:13.446 0 days 23:58:13.446000 86293446
2 23:58:23.447 0 days 23:58:23.447000 86303447
3 23:58:33.440 0 days 23:58:33.440000 86313440
4 23:58:43.440 0 days 23:58:43.440000 86323440
5 23:58:53.440 0 days 23:58:53.440000 86333440
6 23:59:03.434 0 days 23:59:03.434000 86343434
7 23:59:13.435 0 days 23:59:13.435000 86353435
8 23:59:23.428 0 days 23:59:23.428000 86363428
9 23:59:33.428 0 days 23:59:33.428000 86373428
10 23:59:43.429 0 days 23:59:43.429000 86383429
11 23:59:53.435 0 days 23:59:53.435000 86393435
12 00:00:03.429 1 days 00:00:03.429000 86403429
13 00:00:13.423 1 days 00:00:13.423000 86413423
14 00:00:23.417 1 days 00:00:23.417000 86423417
15 00:00:33.411 1 days 00:00:33.411000 86433411
16 00:00:43.418 1 days 00:00:43.418000 86443418
17 00:00:53.411 1 days 00:00:53.411000 86453411
18 00:01:03.405 1 days 00:01:03.405000 86463405
19 00:01:13.406 1 days 00:01:13.406000 86473406
20 00:01:23.400 1 days 00:01:23.400000 86483400
21 00:01:33.406 1 days 00:01:33.406000 86493406
22 00:01:43.400 1 days 00:01:43.400000 86503400
23 00:01:53.411 1 days 00:01:53.411000 86513411
24 00:02:03.400 1 days 00:02:03.400000 86523400
25 00:02:13.406 1 days 00:02:13.406000 86533406
26 00:02:23.394 1 days 00:02:23.394000 86543394
27 00:02:33.400 1 days 00:02:33.400000 86553400
28 00:02:43.394 1 days 00:02:43.394000 86563394
解决方案是数据多天 - 因此对于第一次更改添加 1 天,接下来的 2 天......
创建差异,添加累积和并将输出转换为日时间增量,即添加到原始数据中的内容:
print (df)
0
0 23:59:23.428
1 23:59:33.428
2 23:59:43.429
3 23:59:53.435
4 00:00:03.429
5 00:00:13.423
6 00:00:23.417
7 00:00:33.411
8 23:59:23.428
9 23:59:33.428
10 23:59:43.429
11 23:59:53.435
12 00:00:03.429
13 00:00:13.423
14 00:00:23.417
15 00:00:33.411
td = pd.to_timedelta(df[0])
days = pd.to_timedelta(td.diff().lt(pd.Timedelta(0)).cumsum(), unit='d')
df['new'] = td + days
df['newint'] = (df['new'].dt.total_seconds() * 1000).astype(int)
print (df)
0 new newint
0 23:59:23.428 0 days 23:59:23.428000 86363428
1 23:59:33.428 0 days 23:59:33.428000 86373428
2 23:59:43.429 0 days 23:59:43.429000 86383429
3 23:59:53.435 0 days 23:59:53.435000 86393435
4 00:00:03.429 1 days 00:00:03.429000 86403429
5 00:00:13.423 1 days 00:00:13.423000 86413423
6 00:00:23.417 1 days 00:00:23.417000 86423417
7 00:00:33.411 1 days 00:00:33.411000 86433411
8 23:59:23.428 1 days 23:59:23.428000 172763428
9 23:59:33.428 1 days 23:59:33.428000 172773428
10 23:59:43.429 1 days 23:59:43.429000 172783429
11 23:59:53.435 1 days 23:59:53.435000 172793435
12 00:00:03.429 2 days 00:00:03.429000 172803429
13 00:00:13.423 2 days 00:00:13.423000 172813423
14 00:00:23.417 2 days 00:00:23.417000 172823417
15 00:00:33.411 2 days 00:00:33.411000 172833411
EDIT:
天数解释:
首先得到差异diff http://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.diff.html:
print (td.diff())
0 NaT
1 00:00:10
2 00:00:10.001000
3 00:00:10.006000
4 -1 days +00:00:09.994000
5 00:00:09.994000
6 00:00:09.994000
7 00:00:09.994000
8 23:58:50.017000
9 00:00:10
10 00:00:10.001000
11 00:00:10.006000
12 -1 days +00:00:09.994000
13 00:00:09.994000
14 00:00:09.994000
15 00:00:09.994000
Name: 0, dtype: timedelta64[ns]
然后比较通过lt http://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.lt.html (<
) 对于负 Timedeltas:
print (td.diff().lt(pd.Timedelta(0)))
0 False
1 False
2 False
3 False
4 True
5 False
6 False
7 False
8 False
9 False
10 False
11 False
12 True
13 False
14 False
15 False
Name: 0, dtype: bool
获取累计总和cumsum http://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.cumsum.html:
print (td.diff().lt(pd.Timedelta(0)).cumsum())
0 0
1 0
2 0
3 0
4 1
5 1
6 1
7 1
8 1
9 1
10 1
11 1
12 2
13 2
14 2
15 2
Name: 0, dtype: int32
最后转换为天数 timedeltas:
days = pd.to_timedelta(td.diff().lt(pd.Timedelta(0)).cumsum(), unit='d')
print (days)
0 0 days
1 0 days
2 0 days
3 0 days
4 1 days
5 1 days
6 1 days
7 1 days
8 1 days
9 1 days
10 1 days
11 1 days
12 2 days
13 2 days
14 2 days
15 2 days
Name: 0, dtype: timedelta64[ns]
EDIT:
您的解决方案中可以使用相同的IDE:
...
df['Total Time(ms)'] = df['Hours']*3600000 + df['Minutes']*60000 +
df['Seconds']*1000 + df['Milliseconds']
s = df['Total Time(ms)'].diff().lt(0).cumsum() * 24 * 60 * 60 * 1000
df['newint'] = s + df['Total Time(ms)']
print (df)
0 Hours Minutes Seconds Milliseconds Total Time(ms) \
0 23:59:23.428 23 59 23 428 86363428
1 23:59:33.428 23 59 33 428 86373428
2 23:59:43.429 23 59 43 429 86383429
3 23:59:53.435 23 59 53 435 86393435
4 00:00:03.429 0 0 3 429 3429
5 00:00:13.423 0 0 13 423 13423
6 00:00:23.417 0 0 23 417 23417
7 00:00:33.411 0 0 33 411 33411
8 23:59:23.428 23 59 23 428 86363428
9 23:59:33.428 23 59 33 428 86373428
10 23:59:43.429 23 59 43 429 86383429
11 23:59:53.435 23 59 53 435 86393435
12 00:00:03.429 0 0 3 429 3429
13 00:00:13.423 0 0 13 423 13423
14 00:00:23.417 0 0 23 417 23417
15 00:00:33.411 0 0 33 411 33411
newint
0 86363428
1 86373428
2 86383429
3 86393435
4 86403429
5 86413423
6 86423417
7 86433411
8 172763428
9 172773428
10 172783429
11 172793435
12 172803429
13 172813423
14 172823417
15 172833411