一对一合并使用merge_asof
Use a merge_asof https://pandas.pydata.org/docs/reference/api/pandas.merge_asof.html with tolerance
:
df1[['startTimeIso', 'endTimeIso']] = df1[['startTimeIso', 'endTimeIso']].apply(pd.to_datetime)
df2[['startTimeIso', 'endTimeIso']] = df2[['startTimeIso', 'endTimeIso']].apply(pd.to_datetime)
out = pd.merge_asof(
df2.sort_values(by='startTimeIso'),
df1.sort_values(by='startTimeIso')
.rename(columns={'startTimeIso': 'startTimeIso_y'}),
left_on='startTimeIso', right_on='startTimeIso_y',
direction='nearest', tolerance=pd.Timedelta('1s'),
suffixes=(None, '_y')
)
print(out)
Output:
startTimeIso endTimeIso value endTimeIso_y id
0 2023-03-07 03:28:57.169 2023-03-07 03:29:25.996 True 2023-03-07 03:29:25.396 5.0
1 2023-03-07 03:57:08.734 2023-03-07 03:58:08.734 True 2023-03-07 03:58:08.734 7.0
2 2023-03-07 05:38:08.734 2023-03-07 05:40:10.271 True NaT NaN
3 2023-03-07 07:58:08.934 2023-03-07 07:58:10.371 True 2023-03-07 07:58:10.271 21.0
如果您想考虑开始或结束,请执行两次合并并combine_first https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.combine_first.html:
out1 = pd.merge_asof(
df2.sort_values(by='startTimeIso').reset_index(),
df1.sort_values(by='startTimeIso')
.rename(columns={'startTimeIso': 'startTimeIso_y'}),
left_on='startTimeIso', right_on='startTimeIso_y',
direction='nearest', tolerance=pd.Timedelta('1s'),
suffixes=(None, '_y')
)
out2 = pd.merge_asof(
df2.sort_values(by='endTimeIso').reset_index(),
df1.sort_values(by='endTimeIso')
.rename(columns={'endTimeIso': 'endTimeIso_y'}),
left_on='endTimeIso', right_on='endTimeIso_y',
direction='nearest', tolerance=pd.Timedelta('1s'),
suffixes=(None, '_y')
)
out = out1.combine_first(out2).set_index('index')
print(out)
多对多合并使用numpy /questions/tagged/numpy
s1 = df1['startTimeIso'].to_numpy()[:,None]
s2 = df2['startTimeIso'].to_numpy()
e1 = df1['endTimeIso'].to_numpy()[:,None]
e2 = df2['endTimeIso'].to_numpy()
ms = abs(s1-s2) < pd.Timedelta('1s')
me = abs(e1-e2) < pd.Timedelta('1s')
idx1, idx2 = np.where(ms&me)
out = df2.join(df1.iloc[idx1].set_axis(df2.index[idx2])
.rename(columns={'startTimeIso': 'startTimeIso_y',
'endTimeIso': 'endTimeIso_y'}))