我的数据集中的行数超过 500000。我需要每个的豪斯多夫距离id
自己与他人之间。并对整个数据集重复此操作
我有一个巨大的数据集。这是小部分:
df =
id_easy ordinal latitude longitude epoch day_of_week
0 aaa 1.0 22.0701 2.6685 01-01-11 07:45 Friday
1 aaa 2.0 22.0716 2.6695 01-01-11 07:45 Friday
2 aaa 3.0 22.0722 2.6696 01-01-11 07:46 Friday
3 bbb 1.0 22.1166 2.6898 01-01-11 07:58 Friday
4 bbb 2.0 22.1162 2.6951 01-01-11 07:59 Friday
5 ccc 1.0 22.1166 2.6898 01-01-11 07:58 Friday
6 ccc 2.0 22.1162 2.6951 01-01-11 07:59 Friday
我想计算豪斯多夫距离:
import pandas as pd
import numpy as np
from scipy.spatial.distance import directed_hausdorff
from scipy.spatial.distance import pdist, squareform
u = np.array([(2.6685,22.0701),(2.6695,22.0716),(2.6696,22.0722)]) # coordinates of `id_easy` of taxi `aaa`
v = np.array([(2.6898,22.1166),(2.6951,22.1162)]) # coordinates of `id_easy` of taxi `bbb`
directed_hausdorff(u, v)[0]
输出是0.05114626086039758
现在我想计算整个数据集的这个距离。对全部id_easy
s。期望的输出是矩阵0
在对角线上(因为之间的距离aaa
and aaa
is 0
):
aaa bbb ccc
aaa 0 0.05114 ...
bbb ... 0
ccc 0