From NLTK 的 GitHub https://github.com/nltk/nltk/issues/390#issuecomment-53171900:
NLTK3中的FreqDist是collections.Counter的包装器;专柜提供most_common()
方法按顺序返回项目。FreqDist.keys()
方法由标准库提供;它没有被覆盖。我认为我们与 stdlib 变得更加兼容是件好事。
googlecode 上的文档非常旧,是 2011 年的。更多最新文档可以在http://nltk.org http://nltk.org网站。
因此对于 NLKT 版本 3,而不是fdist1.keys()[:50]
, use fdist1.most_common(50)
.
The tutorial http://www.nltk.org/book/ch01.html#frequency-distributions也已更新:
fdist1 = FreqDist(text1)
>>> print(fdist1)
<FreqDist with 19317 samples and 260819 outcomes>
>>> fdist1.most_common(50)
[(',', 18713), ('the', 13721), ('.', 6862), ('of', 6536), ('and', 6024),
('a', 4569), ('to', 4542), (';', 4072), ('in', 3916), ('that', 2982),
("'", 2684), ('-', 2552), ('his', 2459), ('it', 2209), ('I', 2124),
('s', 1739), ('is', 1695), ('he', 1661), ('with', 1659), ('was', 1632),
('as', 1620), ('"', 1478), ('all', 1462), ('for', 1414), ('this', 1280),
('!', 1269), ('at', 1231), ('by', 1137), ('but', 1113), ('not', 1103),
('--', 1070), ('him', 1058), ('from', 1052), ('be', 1030), ('on', 1005),
('so', 918), ('whale', 906), ('one', 889), ('you', 841), ('had', 767),
('have', 760), ('there', 715), ('But', 705), ('or', 697), ('were', 680),
('now', 646), ('which', 640), ('?', 637), ('me', 627), ('like', 624)]
>>> fdist1['whale']
906