我有一个巨大的字符串,例如:
睡鼠的故事。从前,有三个小
姐妹;他们的名字是埃尔西、莱西和蒂莉;他们住着
在井底......坏话......
我有一份大约 400 个脏话的清单:
bad_words = ["badword", "badword1", ....]
检查文本是否包含坏词列表中的坏词的最有效方法是什么?
我可以循环文本和列表,如下所示:
for word in huge_string:
for bw in bad_words_list:
if bw in word:
# print "bad word is inside text"...
但这对我来说似乎是90年代的..
Update:坏词是单个词。
将文本转换为一组单词并计算其与一组不良单词的交集将为您提供摊销速度:
text = "The Dormouse's story. Once upon a time there were three little sisters; and their names were Elsie, Lacie and Tillie; and they lived at the bottom of a well....badword..."
badwords = set(["badword", "badword1", ....])
textwords = set(word for word in text.split())
for badword in badwords.intersection(textwords):
print("The bad word '{}' was found in the text".format(badword))
本文内容由网友自发贡献,版权归原作者所有,本站不承担相应法律责任。如您发现有涉嫌抄袭侵权的内容,请联系:hwhale#tublm.com(使用前将#替换为@)