检查字符串中的(仅整个)单词

2023-12-19

Checkio 上的培训。该任务称为流行词。任务是从给定字符串的(字符串)列表中搜索单词。

例如:

textt="When I was One I had just begun When I was Two I was nearly new"

wwords=['i', 'was', 'three', 'near']

我的代码如下:

def popular_words(text: str, words: list) -> dict:
    # your code here

    occurence={}
    text=text.lower()


    for i in words:
        occurence[i]=(text.count(i))

    # incorrectly takes "nearly" as "near"


    print(occurence)
    return(occurence)

popular_words(textt,wwords)

效果几乎很好,返回

{'i': 4, 'was': 3, 'three': 0, 'near': 1} 

因此将“near”算作“near”的一部分。这显然是作者的意图。然而,除了

"search for words that are not first (index 0) or last (last index) and for these that begin/end with whitespace"

我可以请求帮助吗?请以这个相当幼稚的代码为基础。


你会过得更好分裂你的句子,然后计算单词数,而不是子串:

textt="When I was One I had just begun When I was Two I was nearly new"
wwords=['i', 'was', 'three', 'near']
text_words = textt.lower().split()
result = {w:text_words.count(w) for w in wwords}

print(result)

prints:

{'three': 0, 'i': 4, 'near': 0, 'was': 3}

如果文本现在有标点符号,您最好使用正则表达式根据非字母数字分割字符串:

import re

textt="When I was One, I had just begun.I was Two when I was nearly new"

wwords=['i', 'was', 'three', 'near']
text_words = re.split("\W+",textt.lower())
result = {w:text_words.count(w) for w in wwords}

result:

{'was': 3, 'near': 0, 'three': 0, 'i': 4}

(另一种选择是使用findall关于单词字符:text_words = re.findall(r"\w+",textt.lower()))

现在,如果您的“重要”单词列表很大,也许最好数一数all单词,然后使用经典过滤collections.Counter:

text_words = collections.Counter(re.split("\W+",textt.lower()))
result = {w:text_words.get(w) for w in wwords}
本文内容由网友自发贡献,版权归原作者所有,本站不承担相应法律责任。如您发现有涉嫌抄袭侵权的内容,请联系:hwhale#tublm.com(使用前将#替换为@)

检查字符串中的(仅整个)单词 的相关文章

随机推荐