下面的代码将句子分成单独的标记,输出如下
"cloud" "computing" "is" "benefiting" " major" "manufacturing" "companies"
import en_core_web_sm
nlp = en_core_web_sm.load()
doc = nlp("Cloud computing is benefiting major manufacturing companies")
for token in doc:
print(token.text)
我理想中想要的是,将“云计算”放在一起阅读,因为它在技术上是一个词。
基本上我正在寻找双克。 Spacy 中是否有允许 Bi gram 或 Trigram 的功能?
Spacy 允许检测名词块。因此,要将名词短语解析为单个实体,请执行以下操作:
-
检测名词块https://spacy.io/usage/linguistic-features#noun-chunks https://spacy.io/usage/linguistic-features#noun-chunks
-
合并名词块
-
再次进行依赖解析,现在它会将“云计算”解析为单个实体。
>>> import spacy
>>> nlp = spacy.load('en')
>>> doc = nlp("Cloud computing is benefiting major manufacturing companies")
>>> list(doc.noun_chunks)
[Cloud computing, major manufacturing companies]
>>> for noun_phrase in list(doc.noun_chunks):
... noun_phrase.merge(noun_phrase.root.tag_, noun_phrase.root.lemma_, noun_phrase.root.ent_type_)
...
Cloud computing
major manufacturing companies
>>> [(token.text,token.pos_) for token in doc]
[('Cloud computing', 'NOUN'), ('is', 'VERB'), ('benefiting', 'VERB'), ('major manufacturing companies', 'NOUN')]
本文内容由网友自发贡献,版权归原作者所有,本站不承担相应法律责任。如您发现有涉嫌抄袭侵权的内容,请联系:hwhale#tublm.com(使用前将#替换为@)