首先,如果您只是使用斯坦福 CoreNLP 依赖解析器的预训练模型,则应该使用CoreNLPDependencyParser
from nltk.parse.corenlp
并避免使用旧的nltk.parse.stanford
界面。
See 斯坦福解析器和 NLTK https://stackoverflow.com/questions/13883277/stanford-parser-and-nltk/51981566#51981566
下载并在终端中运行 Java 服务器后,在 Python 中:
>>> from nltk.parse.corenlp import CoreNLPDependencyParser
>>> dep_parser = CoreNLPDependencyParser(url='http://localhost:9000')
>>> sent = "I shot an elephant with a banana .".split()
>>> parses = list(dep_parser.parse(sent))
>>> type(parses[0])
<class 'nltk.parse.dependencygraph.DependencyGraph'>
现在我们看到解析的类型DependencyGraph
from nltk.parse.dependencygraph
https://github.com/nltk/nltk/blob/develop/nltk/parse/dependencygraph.py#L36 https://github.com/nltk/nltk/blob/develop/nltk/parse/dependencygraph.py#L36
要转换DependencyGraph
to a nltk.tree.Tree
通过简单地做来反对DependencyGraph.tree()
:
>>> parses[0].tree()
Tree('shot', ['I', Tree('elephant', ['an']), Tree('banana', ['with', 'a']), '.'])
>>> parses[0].tree().pretty_print()
shot
_________|____________
| | elephant banana
| | | _____|_____
I . an with a
要将其转换为括号解析格式:
>>> print(parses[0].tree())
(shot I (elephant an) (banana with a) .)
如果您正在寻找依赖三元组:
>>> [(governor, dep, dependent) for governor, dep, dependent in parses[0].triples()]
[(('shot', 'VBD'), 'nsubj', ('I', 'PRP')), (('shot', 'VBD'), 'dobj', ('elephant', 'NN')), (('elephant', 'NN'), 'det', ('an', 'DT')), (('shot', 'VBD'), 'nmod', ('banana', 'NN')), (('banana', 'NN'), 'case', ('with', 'IN')), (('banana', 'NN'), 'det', ('a', 'DT')), (('shot', 'VBD'), 'punct', ('.', '.'))]
>>> for governor, dep, dependent in parses[0].triples():
... print(governor, dep, dependent)
...
('shot', 'VBD') nsubj ('I', 'PRP')
('shot', 'VBD') dobj ('elephant', 'NN')
('elephant', 'NN') det ('an', 'DT')
('shot', 'VBD') nmod ('banana', 'NN')
('banana', 'NN') case ('with', 'IN')
('banana', 'NN') det ('a', 'DT')
('shot', 'VBD') punct ('.', '.')
在 CONLL 格式中:
>>> print(parses[0].to_conll(style=10))
1 I I PRP PRP _ 2 nsubj _ _
2 shot shoot VBD VBD _ 0 ROOT _ _
3 an a DT DT _ 4 det _ _
4 elephant elephant NN NN _ 2 dobj _ _
5 with with IN IN _ 7 case _ _
6 a a DT DT _ 7 det _ _
7 banana banana NN NN _ 2 nmod _ _
8 . . . . _ 2 punct _ _