pd.merge_asof 在第二次运行时失败,并显示“ValueError:左键必须排序”

2024-04-28

您好,我正在尝试合并最接近匹配的日期时间的两个数据集。

我有开放事件和封闭事件的两个时间戳。

merge_asof 在开放日期运行良好,但返回'ValueError:左键必须排序'在第二个日期时间。

我在这两种情况下都按相关的日期时间排序。

第一个数据框:

   idtbl_station_manager     date_time_stamp fld_station_number  \
0                   1121 2017-09-19 15:41:24            AM00571   
1                   1122 2017-09-19 15:41:24            AM00572   
2                   1123 2017-09-19 15:41:24            AM00573   

  fld_grid_number fld_status  fld_station_number_int  \
0     VOY-024-001     CLOSED                     571   
1     VOY-024-002     CLOSED                     572   
2     VOY-024-003     CLOSED                     573   

                  fld_activities date_time_stamp_open fld_lat_open  \
0  Drift Net,CTD-Overside,Dredge  2017-04-13 07:23:35                
1  Drift Net,CTD-Overside,Dredge  2017-04-13 10:15:07   4649.028 S   
2  Drift Net,CTD-Overside,Dredge  2017-04-13 13:15:42   4648.497 S   

  fld_lon_open date_time_stamp_close fld_lat_close fld_lon_close  
0  03759.143 E   2017-04-13 09:51:18    4647.361 S   03759.142 E  
1  03759.143 E   2017-04-13 12:11:00    4647.344 S   03759.143 E  
2                2017-04-13 15:09:26    4647.344 S   03759.143 E  

第二个数据框:

         idtbl_gpgga     date_time_stamp    fld_utc   fld_lat fld_lat_dir  \
1179828      1179829 2017-04-04 02:00:04  000005.00  3354.138           S   
0                  1 2017-04-04 02:00:05  000006.00  3354.138           S   
1                  2 2017-04-04 02:00:07  000008.00  3354.138           S   

          fld_lon fld_lon_dir fld_gps_quality fld_nos fld_hdop fld_alt  \
1179828  1825.557           E               1      10      0.9    21.6   
0        1825.557           E               1      10      0.9    21.6   
1        1825.557           E               1      10      0.9    21.6   

        fld_unit_alt fld_alt_geoid fld_unit_alt_geoid fld_dgps_age fld_dgps_id  
1179828            M          31.9                  M                        0  
0                  M          31.9                  M                        0  
1                  M          31.9                  M                        0  

这按预期工作:

# First we grab the open time lat and lons

# Sort by date_times used for merge
df_stationManager.sort_values("date_time_stamp_open", inplace=True)
df_gpgga.sort_values("date_time_stamp", inplace=True)

#merge_asof used to get closest match on datetime
pd_open = pd.merge_asof(df_stationManager, df_gpgga, left_on=['date_time_stamp_open'], right_on=['date_time_stamp'], direction="nearest")

pd_open["fld_lat_open"] = pd_open["fld_lat"] + ' ' +  pd_open["fld_lat_dir"]
pd_open["fld_lon_open"] = pd_open["fld_lon"] + ' ' +  pd_open["fld_lon_dir"]     

这失败了:

'ValueError:左键必须排序'

# Now we grab the close time lat and lons

# Sort by date_times used for merge
df_stationManager.sort_values("date_time_stamp_close", inplace=True)
df_gpgga.sort_values("date_time_stamp", inplace=True)

#merge_asof used to get closest match on datetime
pd_close = pd.merge_asof(df_stationManager, df_gpgga, left_on=['date_time_stamp_close'], right_on=['date_time_stamp'], direction="nearest")

pd_close["fld_lat_close"] = pd_close["fld_lat"] + ' ' +  pd_close["fld_lat_dir"]
pd_close["fld_lat_close"] = pd_close["fld_lon"] + ' ' +  pd_close["fld_lon_dir"]  

任何建议将不胜感激。


正如 @JohnE 所指出的,df_stationManager 数据框中存在 NaT 值。

通过合并前清理解决:

df_stationManager = df_stationManager.dropna() 
本文内容由网友自发贡献,版权归原作者所有,本站不承担相应法律责任。如您发现有涉嫌抄袭侵权的内容,请联系:hwhale#tublm.com(使用前将#替换为@)

pd.merge_asof 在第二次运行时失败,并显示“ValueError:左键必须排序” 的相关文章

随机推荐