正如我的评论中所述,您可以像这样保存和加载模型对象:
# Save model
filename = 'stored_model.wv' # Can be any arbitrary filename
trained_model.save(filename)
# Reload model
retrieved_model = gensim.models.Word2Vec.load(filename)
为了检索多个查询,我建议定义一个查询列表并迭代它以检索所有结果。
# Define queries (this is the only user input required!)
my_queries = [{'positive' : ['smartthings','amazon'],
'negative' : ['samsung']},
{'positive' : ['light','nest'],
'negative' : ['hue']},
#<and so forth...>
]
# Initialize empty result list
query_results = []
# Collect query results
for query in my_queries:
result = retrieved_model.wv.most_similar(**query)
query_results.append(result)
最后,您可以使用结果列表以您想要的格式写入 csv 文件。可以构造文件的标头来表示查询。
# Open the file
with open("my_results.csv", "w") as outfile:
# Construct the header
header = []
for query in my_queries:
head = 'pos:'+'+'.join(query['positive'])+'__neg:'+'+'.join(query['negative'])
# First resulting head: 'pos:smartthings+amazon__neg:samsung'
header.append(head)
# Write the header
# Note the additional empty fields (,_,) because each head needs two columns
outfile.write(",_,".join(header)+",_\n")
# Write the second row to label the columns
outfile.write(",".join(["word,cos_sim" for i in range(len(header))])+'\n')
# Write the data
for i in range(len(query_results[0])):
row_results = [r[0]+','+str(r[1]) for r in query_results[i]]
outfile.write(",".join(row_results)+'\n')
请注意,这仅在每个查询检索相同数量的项目时才有效(默认情况下是这种情况,但可以使用topn
关键字参数most_similar
).