我有一个只有一列的数据框,该列中有 1000 行。
我需要比较所有行并找到所有行的编辑距离。我如何计算Python中的比率或距离
我有一个数据框如下:
#Df
StepDescription
click confirm button when done
you have logged on
please log in to proceed
click on confirm button
Dolb was released successfully
Enter your details
validate the statement
Aval was released sucessfully
如何计算所有这些的编辑比率
我编写的代码是为了迭代循环,但迭代之后如何继续。
import Levenshtein
import pandas as pd
data_dist = pd.read_csv('path\Data_TestDescription.csv')
df = pd.DataFrame(data_dist)
for index, row in df.iterrows():
正如评论中所要求的,百分比是所需的,我将保留已接受的答案并仅添加新部分:
import numpy as np
import pandas as pd
from Levenshtein import distance
from itertools import product
#df = ...
dist = [distance(*x) for x in product(df.StepDescription, repeat=2)]
dist_df = pd.DataFrame(np.array(dist).reshape(df.shape[0], df.shape[0]))
dist_df
0 1 2 3 4 5 6 7
0 0 23 23 13 29 25 25 28
1 23 0 18 18 23 18 18 23
2 23 18 0 20 25 21 19 24
3 13 18 20 0 27 19 21 26
4 29 23 25 27 0 26 23 5
5 25 18 21 19 26 0 19 25
6 25 18 19 21 23 19 0 21
7 28 23 24 26 5 25 21 0
dist_df_percentage = dist_df // min(x for x in dist if x > 0) * 100
0 1 2 3 4 5 6 7
0 0 460 460 260 580 500 500 560
1 460 0 360 360 460 360 360 460
2 460 360 0 400 500 420 380 480
3 260 360 400 0 540 380 420 520
4 580 460 500 540 0 520 460 100
5 500 360 420 380 520 0 380 500
6 500 360 380 420 460 380 0 420
7 560 460 480 520 100 500 420 0
本文内容由网友自发贡献,版权归原作者所有,本站不承担相应法律责任。如您发现有涉嫌抄袭侵权的内容,请联系:hwhale#tublm.com(使用前将#替换为@)