pyLDAvis:尝试使用 BTM 可视化主题时出现验证错误

2024-01-11

我尝试使用生成主题BTM https://github.com/markoarnauto/biterm。 在尝试可视化主题时,我收到验证错误。我可以在模型训练后打印主题,但使用 pyLDAvis 失败

def btm_model():
    num_topics = 10
    texts = open('./textfiles/Ori-Apr2, 2019.txt').read().splitlines()
    # vectorize texts
    vec = CountVectorizer(stop_words='english')
    X = vec.fit_transform(texts).toarray()
    # get vocabulary
    vocab = np.array(vec.get_feature_names())
    # get biterms
    biterms = vec_to_biterms(X)
    # create btm
    btm = oBTM(num_topics = num_topics, V = vocab)
    print("\n\n Train Online BTM ..")
    for i in range(0, 1): 
        biterms_chunk = biterms[i:i + 100]
        btm.fit(biterms_chunk, iterations=10)

    print("\n\n Topic coherence ..")
    res, C_z_sum = topic_summuary(btm.phi_wz.T, X, vocab, 10)

    topics = btm.transform(biterms)
    print("\n\n Visualize Topics ..")
    vis = pyLDAvis.prepare(btm.phi_wz.T, topics, np.count_nonzero(X, axis=1), vocab, np.sum(X, axis=0))
    pyLDAvis.save_html(vis, './textfiles/online_btm.html')

在上面运行 pyLDAvis 后尝试时出现以下错误

Traceback (most recent call last):
  File "main_mining.py", line 293, in <module>
    btm_model(num_topics)
  File "main_mining.py", line 187, in btm_model
    vis = pyLDAvis.prepare(btm.phi_wz.T, topics, np.count_nonzero(X, axis=1), vocab, np.sum(X, axis=0))
  File "C:\Python Install Location\lib\site-packages\pyLDAvis\_prepare.py", line 375, in prepare
    _input_validate(topic_term_dists, doc_topic_dists, doc_lengths, vocab, term_frequency)
  File "C:\Python Install Location\lib\site-packages\pyLDAvis\_prepare.py", line 65, in _input_validate
    raise ValidationError('\n' + '\n'.join([' * ' + s for s in res]))
pyLDAvis._prepare.ValidationError:
 * Not all rows (distributions) in doc_topic_dists sum to 1.

就我而言,发生这种情况是因为我有一些句子只有几个标记。我删除了所有少于三个标记的句子,它的作用就像魅力一样。

本文内容由网友自发贡献,版权归原作者所有,本站不承担相应法律责任。如您发现有涉嫌抄袭侵权的内容,请联系:hwhale#tublm.com(使用前将#替换为@)

pyLDAvis:尝试使用 BTM 可视化主题时出现验证错误 的相关文章

随机推荐