您好,我正在尝试合并最接近匹配的日期时间的两个数据集。
我有开放事件和封闭事件的两个时间戳。
merge_asof 在开放日期运行良好,但返回'ValueError:左键必须排序'在第二个日期时间。
我在这两种情况下都按相关的日期时间排序。
第一个数据框:
idtbl_station_manager date_time_stamp fld_station_number \
0 1121 2017-09-19 15:41:24 AM00571
1 1122 2017-09-19 15:41:24 AM00572
2 1123 2017-09-19 15:41:24 AM00573
fld_grid_number fld_status fld_station_number_int \
0 VOY-024-001 CLOSED 571
1 VOY-024-002 CLOSED 572
2 VOY-024-003 CLOSED 573
fld_activities date_time_stamp_open fld_lat_open \
0 Drift Net,CTD-Overside,Dredge 2017-04-13 07:23:35
1 Drift Net,CTD-Overside,Dredge 2017-04-13 10:15:07 4649.028 S
2 Drift Net,CTD-Overside,Dredge 2017-04-13 13:15:42 4648.497 S
fld_lon_open date_time_stamp_close fld_lat_close fld_lon_close
0 03759.143 E 2017-04-13 09:51:18 4647.361 S 03759.142 E
1 03759.143 E 2017-04-13 12:11:00 4647.344 S 03759.143 E
2 2017-04-13 15:09:26 4647.344 S 03759.143 E
第二个数据框:
idtbl_gpgga date_time_stamp fld_utc fld_lat fld_lat_dir \
1179828 1179829 2017-04-04 02:00:04 000005.00 3354.138 S
0 1 2017-04-04 02:00:05 000006.00 3354.138 S
1 2 2017-04-04 02:00:07 000008.00 3354.138 S
fld_lon fld_lon_dir fld_gps_quality fld_nos fld_hdop fld_alt \
1179828 1825.557 E 1 10 0.9 21.6
0 1825.557 E 1 10 0.9 21.6
1 1825.557 E 1 10 0.9 21.6
fld_unit_alt fld_alt_geoid fld_unit_alt_geoid fld_dgps_age fld_dgps_id
1179828 M 31.9 M 0
0 M 31.9 M 0
1 M 31.9 M 0
这按预期工作:
# First we grab the open time lat and lons
# Sort by date_times used for merge
df_stationManager.sort_values("date_time_stamp_open", inplace=True)
df_gpgga.sort_values("date_time_stamp", inplace=True)
#merge_asof used to get closest match on datetime
pd_open = pd.merge_asof(df_stationManager, df_gpgga, left_on=['date_time_stamp_open'], right_on=['date_time_stamp'], direction="nearest")
pd_open["fld_lat_open"] = pd_open["fld_lat"] + ' ' + pd_open["fld_lat_dir"]
pd_open["fld_lon_open"] = pd_open["fld_lon"] + ' ' + pd_open["fld_lon_dir"]
这失败了:
'ValueError:左键必须排序'
# Now we grab the close time lat and lons
# Sort by date_times used for merge
df_stationManager.sort_values("date_time_stamp_close", inplace=True)
df_gpgga.sort_values("date_time_stamp", inplace=True)
#merge_asof used to get closest match on datetime
pd_close = pd.merge_asof(df_stationManager, df_gpgga, left_on=['date_time_stamp_close'], right_on=['date_time_stamp'], direction="nearest")
pd_close["fld_lat_close"] = pd_close["fld_lat"] + ' ' + pd_close["fld_lat_dir"]
pd_close["fld_lat_close"] = pd_close["fld_lon"] + ' ' + pd_close["fld_lon_dir"]
任何建议将不胜感激。