gensim word2vec - 使用在线词嵌入更新数组维度

2024-01-19

gensim 0.13.4.1 中的 Word2Vec 无法动态更新词向量。

model.build_vocab(sentences, update=False)

工作正常;然而,

model.build_vocab(sentences, update=True)

才不是。


我在用这个网站 http://rutumulkar.com/blog/2015/word2vec尝试模仿他们所做的事情;因此我有时会使用以下脚本:

model = gensim.models.Word2Vec()
sentences = gensim.models.word2vec.LineSentence("./text8/text8")
model.build_vocab(sentences, keep_raw_vocab=False, trim_rule=None, progress_per=10000, update=False)
model.train(sentences)

然而,虽然这与update=False, using update=True给我以下回溯:

Traceback (most recent call last):
  File "word2vecAttempt.py", line 34, in <module>
    model.build_vocab(sentences, progress_per=10000, update=True)
  File "/home/brownc/anaconda3/lib/python3.5/site-packages/gensim/models/word2vec.py", line 535, in build_vocab
    self.finalize_vocab(update=update)  # build tables & arrays
  File "/home/brownc/anaconda3/lib/python3.5/site-packages/gensim/models/word2vec.py", line 708, in finalize_vocab
    self.update_weights()
  File "/home/brownc/anaconda3/lib/python3.5/site-packages/gensim/models/word2vec.py", line 1070, in update_weights
    self.wv.syn0 = vstack([self.wv.syn0, newsyn0])
  File "/home/brownc/anaconda3/lib/python3.5/site-packages/numpy/core/shape_base.py", line 230, in vstack
    return _nx.concatenate([atleast_2d(_m) for _m in tup], 0)
ValueError: all the input array dimensions except for the concatenation axis must match exactly

我能够重现您的错误。我想你正在打电话update=True当模型尚未训练时。您应该仅在经过预训练后才调用它。

这有效:

import gensim

model = gensim.models.Word2Vec()
sentences = gensim.models.word2vec.LineSentence("text8")
model.build_vocab(sentences, update=False)
model.train(sentences)

model.build_vocab(sentences, update=True)
model.train(sentences)

但这会失败:

import gensim

model = gensim.models.Word2Vec()
sentences = gensim.models.word2vec.LineSentence("text8")
model.build_vocab(sentences, update=True)
model.train(sentences)

ValueError: all the input array dimensions except for the concatenation axis must match exactly

使用最新版本的gensim 0.13.4.1。

本文内容由网友自发贡献,版权归原作者所有,本站不承担相应法律责任。如您发现有涉嫌抄袭侵权的内容,请联系:hwhale#tublm.com(使用前将#替换为@)

gensim word2vec - 使用在线词嵌入更新数组维度 的相关文章

随机推荐