以下字符串被视为相等。我怎样才能匹配这样的东西?
"Hazard Const. Company"
"hazard construction company"
"PETERSON-CHASE GENERAL ENGINEERING CONSTRUCTION INC"
"peterson-chase general engineering construction inc"
"TRAFFIC DEVELOPMENT SERVICES "
"traffic development services"
我的环境是红宝石,但我只是想知道匹配字符串的一般原则。由于空格问题和缩写,上面的示例不适用于基本的“a”==“b”。我可以使用正则表达式忽略大小写或小写字符串来缓解大小写问题...
以下示例比较所有字符串并计算编辑差异(使一个字符串适应另一个字符串所需的击键次数)。
根据定义的最大差异和对字符串长度的补偿,然后将字符串作为包含出现次数值的键放入哈希中。
require 'levenshtein'
MAX_DISTANCE, COMPENSATION = 3, 5
strings = [
"Hazard Const. Company",
"hazard construction company",
"PETERSON-CHASE GENERAL ENGINEERING CONSTRUCTION INC",
"peterson-chase general engineering construction inc",
"TRAFFIC DEVELOPMENT SERVICES ",
"traffic development services"
]
result = {}
strings.each do |s|
s.downcase!
similar = result.keys.select { |key| Levenshtein.distance(key, s) < MAX_DISTANCE+(s.length/COMPENSATION) }
if similar.any?
result[similar.first] += 1
else
result.merge!({s => 1})
end
end
puts result.inspect
# {"hazard const. company"=>2, "peterson-chase general engineering construction inc"=>2, "traffic development services "=>2}
本文内容由网友自发贡献,版权归原作者所有,本站不承担相应法律责任。如您发现有涉嫌抄袭侵权的内容,请联系:hwhale#tublm.com(使用前将#替换为@)