非常简单。首先读取字典并将键转换为适当的行和列。 Scipy 支持(并为此目的推荐)坐标格式 http://docs.scipy.org/doc/scipy/reference/generated/scipy.sparse.coo_matrix.html#scipy.sparse.coo_matrix对于稀疏矩阵。
Pass it data
, row
, and column
, where A[row[k], column[k] = data[k]
(对于所有 k) 定义矩阵。然后让 Scipy 进行到 CSR 的转换。
请检查一下,我的行和列是否按照您想要的方式排列,我可能会将它们转置。我还假设输入是 1 索引的。
我的代码如下打印:
(0, 0) 12
(1, 2) 10
(2, 1) 5
Code:
#!/usr/bin/env python3
#http://stackoverflow.com/questions/26335059/converting-python-sparse-matrix-dict-to-scipy-sparse-matrix
from scipy.sparse import csr_matrix, coo_matrix
def convert(term_dict):
''' Convert a dictionary with elements of form ('d1', 't1'): 12 to a CSR type matrix.
The element ('d1', 't1'): 12 becomes entry (0, 0) = 12.
* Conversion from 1-indexed to 0-indexed.
* d is row
* t is column.
'''
# Create the appropriate format for the COO format.
data = []
row = []
col = []
for k, v in term_dict.items():
r = int(k[0][1:])
c = int(k[1][1:])
data.append(v)
row.append(r-1)
col.append(c-1)
# Create the COO-matrix
coo = coo_matrix((data,(row,col)))
# Let Scipy convert COO to CSR format and return
return csr_matrix(coo)
if __name__=='__main__':
doc_term_dict = { ('d1','t1'): 12, \
('d2','t3'): 10, \
('d3','t2'): 5 \
}
print(convert(doc_term_dict))