我有以下代码
X = df_X.as_matrix(header[1:col_num])
scaler = preprocessing.StandardScaler().fit(X)
X_nor = scaler.transform(X)
并得到以下错误:
File "/Users/edamame/Library/python_virenv/lib/python2.7/site-packages/sklearn/utils/validation.py", line 54, in _assert_all_finite
" or a value too large for %r." % X.dtype)
ValueError: Input contains NaN, infinity or a value too large for dtype('float64').
I used:
print(np.isinf(X))
print(np.isnan(X))
这给了我下面的输出。这无法真正告诉我哪个元素有问题,因为我有数百万行。
[[False False False ..., False False False]
[False False False ..., False False False]
[False False False ..., False False False]
...,
[False False False ..., False False False]
[False False False ..., False False False]
[False False False ..., False False False]]
有没有办法确定矩阵 X 中的哪个值实际上导致了问题?人们一般如何避免它?