我有一个颜色列表:
initialColors = [u'black' u'black' u'black' u'white' u'white' u'white' u'powderblue'
u'whitesmoke' u'black' u'cornflowerblue' u'powderblue' u'powderblue'
u'goldenrod' u'white' u'lavender' u'white' u'powderblue' u'powderblue'
u'powderblue' u'powderblue' u'powderblue' u'powderblue' u'powderblue'
u'powderblue' u'white' u'white' u'powderblue' u'white' u'white']
我有这些颜色的标签,如下所示:
labels_train = [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1]
0
意味着颜色由女性选择,1
是指男性。我将使用另一组颜色来预测性别。
因此,对于我的初始颜色,我将名称转换为数字特征向量,如下所示:
from sklearn import preprocessing
le = preprocessing.LabelEncoder()
le.fit(initialColors)
features_train = le.transform(initialColors)
之后我的features_train
好像:
[0 0 0 5 5 5 4 6 0 1 4 4 2 5 3 5 4 4 4 4 4 4 4 4 5 5 4 5 5]
最后,我这样做:
from sklearn.naive_bayes import GaussianNB
clf = GaussianNB()
clf.fit(features_train, labels_train)
但我有一个错误:
/Library/Python/2.7/site-packages/sklearn/utils/validation.py:395: DeprecationWarning: Passing 1d arrays as data is deprecated in 0.17 and will raise ValueError in 0.19. Reshape your data either using X.reshape(-1, 1) if your data has a single feature or X.reshape(1, -1) if it contains a single sample.
DeprecationWarning)
Traceback (most recent call last):
File "app.py", line 36, in <module>
clf.fit(features_train, labels_train)
File "/Library/Python/2.7/site-packages/sklearn/naive_bayes.py", line 182, in fit
X, y = check_X_y(X, y)
File "/Library/Python/2.7/site-packages/sklearn/utils/validation.py", line 531, in check_X_y
check_consistent_length(X, y)
File "/Library/Python/2.7/site-packages/sklearn/utils/validation.py", line 181, in check_consistent_length
" samples: %r" % [int(l) for l in lengths])
ValueError: Found input variables with inconsistent numbers of samples: [1, 70]
I did:
features_train = features_train.reshape(-1, 1)
labels_train = labels_train.reshape(-1, 1)
clf.fit(features_train, labels_train)
我有一个错误:
/Library/Python/2.7/site-packages/sklearn/utils/validation.py:526: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().
y = column_or_1d(y, warn=True)
我也尝试过:
features_train = features_train.reshape(1, -1)
labels_train = labels_train.reshape(1, -1)
但不管怎么说:
Traceback (most recent call last):
File "app.py", line 36, in <module>
clf.fit(features_train, labels_train)
File "/Library/Python/2.7/site-packages/sklearn/naive_bayes.py", line 182, in fit
X, y = check_X_y(X, y)
File "/Library/Python/2.7/site-packages/sklearn/utils/validation.py", line 526, in check_X_y
y = column_or_1d(y, warn=True)
File "/Library/Python/2.7/site-packages/sklearn/utils/validation.py", line 562, in column_or_1d
raise ValueError("bad input shape {0}".format(shape))
ValueError: bad input shape (1, 29)
我的问题是我不明白在我的情况下重塑数据的最佳方法是什么。您能帮我选择一种重塑数据的方法吗?