我尝试在 sklearn 管道中第一次使用 featureunion 来组合数字特征(2 列)和文本特征(1 列)以进行多类分类。
from sklearn.preprocessing import FunctionTransformer
from sklearn.pipeline import Pipeline
from sklearn.multiclass import OneVsRestClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.pipeline import FeatureUnion
get_text_data = FunctionTransformer(lambda x: x['text'], validate=False)
get_numeric_data = FunctionTransformer(lambda x: x[['num1','num2']], validate=False)
process_and_join_features = FeatureUnion(
[
('numeric_features', Pipeline([
('selector', get_numeric_data),
('clf', OneVsRestClassifier(LogisticRegression()))
])),
('text_features', Pipeline([
('selector', get_text_data),
('vec', CountVectorizer()),
('clf', OneVsRestClassifier(LogisticRegression()))
]))
]
)
在此代码中,“text”是文本列,“num1”、“num2”是 2 个数字列。
错误信息是
TypeError: All estimators should implement fit and transform. 'Pipeline(memory=None,
steps=[('selector', FunctionTransformer(accept_sparse=False,
func=<function <lambda> at 0x7fefa8efd840>, inv_kw_args=None,
inverse_func=None, kw_args=None, pass_y='deprecated',
validate=False)), ('clf', OneVsRestClassifier(estimator=LogisticRegression(C=1.0, class_weigh...=None, solver='liblinear', tol=0.0001,
verbose=0, warm_start=False),
n_jobs=1))])' (type <class 'sklearn.pipeline.Pipeline'>) doesn't
我错过了什么步骤吗?