你好,我正在尝试用 python 词干分析器来词干,我尝试了 Porter 和 Lancaster,但他们也有同样的问题。他们无法正确阻止以“er”或“e”结尾的单词。
例如,它们源于
computer --> comput
rotate --> rotat
这是代码的一部分
line=line.lower()
line=re.sub(r'[^a-z0-9 ]',' ',line)
line=line.split()
line=[x for x in line if x not in stops]
line=[ porter.stem(word, 0, len(word)-1) for word in line]
# or 'line=[ st.stem(word) for word in line]'
return line
有办法解决这个问题吗?
去引用维基百科上的页面 http://en.wikipedia.org/wiki/Word_stem, In computational linguistics, a stem is the part of the word that never changes even when morphologically inflected, whilst a lemma is the base form of the word. For example, given the word "produced", its lemma (linguistics) is "produce", however the stem is "produc": this is because there are words such as production.
所以你的代码可能会给你正确的结果。您似乎期望一个引理,它不是词干分析器产生的(除非引理恰好等于词干)
本文内容由网友自发贡献,版权归原作者所有,本站不承担相应法律责任。如您发现有涉嫌抄袭侵权的内容,请联系:hwhale#tublm.com(使用前将#替换为@)