这是我的代码:
from sklearn.svm import SVC
from sklearn.grid_search import GridSearchCV
from sklearn.cross_validation import KFold
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn import datasets
import numpy as np
newsgroups = datasets.fetch_20newsgroups(
subset='all',
categories=['alt.atheism', 'sci.space']
)
X = newsgroups.data
y = newsgroups.target
TD_IF = TfidfVectorizer()
y_scaled = TD_IF.fit_transform(newsgroups, y)
grid = {'C': np.power(10.0, np.arange(-5, 6))}
cv = KFold(y_scaled.size, n_folds=5, shuffle=True, random_state=241)
clf = SVC(kernel='linear', random_state=241)
gs = GridSearchCV(estimator=clf, param_grid=grid, scoring='accuracy', cv=cv)
gs.fit(X, y_scaled)
我收到错误,我不明白为什么。回溯:
回溯(最近一次调用最后一次):文件
“C:/Users/Roman/PycharmProjects/week_3/assignment_2.py”,第 23 行,位于
gs.fit(X, y_scaled) #TODO: 检查这一行 File "C:\Users\Roman\AppData\Roaming\Python\Python35\site-packages\sklearn\grid_search.py",
804 行,适合
返回 self._fit(X, y, ParameterGrid(self.param_grid)) 文件 "C:\Users\Roman\AppData\Roaming\Python\Python35\site-packages\sklearn\grid_search.py",
第 525 行,在 _fit 中
X, y = 可索引(X, y) 文件“C:\Users\Roman\AppData\Roaming\Python\Python35\site-packages\sklearn\utils\validation.py”,
第 201 行,可转位
check_concient_length(*结果) 文件“C:\Users\Roman\AppData\Roaming\Python\Python35\site-packages\sklearn\utils\validation.py”,
第 176 行,在 check_concient_length 中
"%s" % str(唯一))
ValueError:发现样本数量不一致的数组:[ 6 1786]
有人可以解释为什么会出现这个错误吗?
我认为你对你的事情有点困惑X
and y
这里。你想要改变你X
进入 tf-idf 向量并使用它进行训练y
。见下文
from sklearn.svm import SVC
from sklearn.grid_search import GridSearchCV
from sklearn.cross_validation import KFold
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn import datasets
import numpy as np
newsgroups = datasets.fetch_20newsgroups(
subset='all',
categories=['alt.atheism', 'sci.space']
)
X = newsgroups.data
y = newsgroups.target
TD_IF = TfidfVectorizer()
X_scaled = TD_IF.fit_transform(X, y)
grid = {'C': np.power(10.0, np.arange(-1, 1))}
cv = KFold(y_scaled.size, n_folds=5, shuffle=True, random_state=241)
clf = SVC(kernel='linear', random_state=241)
gs = GridSearchCV(estimator=clf, param_grid=grid, scoring='accuracy', cv=cv)
gs.fit(X_scaled, y)
本文内容由网友自发贡献,版权归原作者所有,本站不承担相应法律责任。如您发现有涉嫌抄袭侵权的内容,请联系:hwhale#tublm.com(使用前将#替换为@)