发生这种情况是因为ratio
函数在计算比率时使用总序列的长度,但它不使用过滤元素isjunk
。因此,只要匹配块中的匹配数结果相同(有或没有isjunk
),比率测量将是相同的。
我假设序列没有被过滤isjunk
因为性能原因。
def ratio(self):
"""Return a measure of the sequences' similarity (float in [0,1]).
Where T is the total number of elements in both sequences, and
M is the number of matches, this is 2.0*M / T.
"""
matches = sum(triple[-1] for triple in self.get_matching_blocks())
return _calculate_ratio(matches, len(self.a) + len(self.b))
self.a
and self.b
是传递给 SequenceMatcher 对象的字符串(序列)(示例中的“AA”和“A A”)。这isjunk
功能lambda x: x in ' '
仅用于确定匹配块。您的示例非常简单,因此两次调用的结果比率和匹配块是相同的。
difflib.SequenceMatcher(None, "AA", "A A").get_matching_blocks()
[Match(a=0, b=0, size=1), Match(a=1, b=2, size=1), Match(a=2, b=3, size=0)]
difflib.SequenceMatcher(lambda x: x == ' ', "AA", "A A").get_matching_blocks()
[Match(a=0, b=0, size=1), Match(a=1, b=2, size=1), Match(a=2, b=3, size=0)]
相同的匹配块,比例为: M = 2, T = 6 => ratio = 2.0 * 2 / 6
现在考虑下面的例子:
difflib.SequenceMatcher(None, "AA ", "A A").get_matching_blocks()
[Match(a=1, b=0, size=2), Match(a=3, b=3, size=0)]
difflib.SequenceMatcher(lambda x: x == ' ', "AA ", "A A").get_matching_blocks()
[Match(a=0, b=0, size=1), Match(a=1, b=2, size=1), Match(a=3, b=3, size=0)]
现在匹配的块不同,但比率将相同,因为匹配的数量仍然相等:
When isjunk
is None: M = 2, T = 6 => ratio = 2.0 * 2 / 6
When isjunk
is lambda x: x == ' '
: M = 1 + 1, T = 6 => ratio = 2.0 * 2 / 6
最后,不同数量的比赛:
difflib.SequenceMatcher(None, "AA ", "A A ").get_matching_blocks()
[Match(a=1, b=0, size=2), Match(a=3, b=4, size=0)]
difflib.SequenceMatcher(lambda x: x == ' ', "AA ", "A A ").get_matching_blocks()
[Match(a=0, b=0, size=1), Match(a=1, b=2, size=2), Match(a=3, b=4, size=0)]
比赛场数不同
When isjunk
is None: M = 2, T = 7 => ratio = 2.0 * 2 / 7
When isjunk
is lambda x: x == ' '
: M = 1 + 2, T = 6 => ratio = 2.0 * 3 / 7