该错误来自于您在应用模糊测试时调用整个列。如果您执行以下操作,即将模糊应用于单独的行,您会得到相同的结果:
test_anui= test_anui[(test_anui['Address Similarity'].isnull()) & (test_anui['Address Similarity']!='')]
test_anui['Address Similarity 2'] = fuzz.token_sort_ratio(str(test_anui.at[0,'Processed Client Address']), str(test_anui.at[0,'Processed Aruvio Address']))
print('the address similarity is different? ', fuzz.token_sort_ratio(address_a, address_b))
或者,使用.loc
test_anui= test_anui[(test_anui['Address Similarity'].isnull()) & (test_anui['Address Similarity']!='')]
test_anui['Address Similarity 2'] = fuzz.token_sort_ratio(str(test_anui.loc[0,'Processed Client Address']), str(test_anui.loc[0,'Processed Aruvio Address']))
print('the address similarity is different? ', fuzz.token_sort_ratio(address_a, address_b))
数据框中的输出是:
Processed Client Name Processed Aruvio Name \
0 anhui jinhan clothing co ltd anhui jinhan clothing co ltd
Processed Client Address \
0 high new technology development zones huainan ...
Processed Aruvio Address Name Similarity Address Similarity \
0 industrial park of funan city 89.285714 NaN
Address Similarity 2
0 28.099174
and of fuzz.token_sort_ratio(address_a, address_b)
is 28.099173553719012
.
换句话说,您需要指定要从中提取字符串的行。我想您的数据框由几行组成,这意味着您必须对每一行执行此操作:
for i in len(test_anui):
test_anui['Address Similarity 2'] = fuzz.token_sort_ratio(str(test_anui.loc[i,'Processed Client Address']),
str(test_anui.loc[i,'Processed Aruvio Address']))