sklearn“管道实例尚未安装。”错误,尽管它是

2024-04-12

已经提出了类似的问题,但答案并没有帮助我解决我的问题:即使整个管道都安装了,管道中的 Sklearn 组件也没有安装? https://stackoverflow.com/questions/58704347/sklearn-components-in-pipeline-is-not-fitted-even-if-the-whole-pipeline-is

我正在尝试使用多个管道通过 One Hot Encoder 对分类数据和数值数据进行预处理(如建议的那样)这个博客 https://adhikary.net/en/2019/03/23/categorical-and-numeric-data-in-scikit-learn-pipelines/).

这是我的代码,尽管我的分类器产生了 78% 的准确率,但我无法弄清楚为什么我无法绘制我正在训练的决策树以及什么可以帮助我解决问题。这是代码片段:

import pandas as pd
import sklearn.tree as tree
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import OneHotEncoder  
from sklearn.pipeline import Pipeline
from sklearn.impute import SimpleImputer
from sklearn.preprocessing import StandardScaler
from sklearn.compose import ColumnTransformer


X = pd.DataFrame(data=data)  
Y = pd.DataFrame(data=prediction)

categoricalFeatures = ["race", "gender"]
numericalFeatures = ["age", "number_of_actions"]

categoricalTransformer = Pipeline(steps=[
    ('onehot', OneHotEncoder(handle_unknown='ignore')),
])

numericTransformer = Pipeline(steps=[
    ('imputer', SimpleImputer(strategy='median')),
    ('scaler', StandardScaler()),
])

preprocessor = ColumnTransformer(transformers=[
    ('num', numericTransformer, numericalFeatures),
    ('cat', categoricalTransformer, categoricalFeatures)
])

classifier = Pipeline(steps=[
    ('preprocessor', preprocessor),
    ('classifier', tree.DecisionTreeClassifier(max_depth=3))
])

X_train, X_test, y_train, y_test = train_test_split(X, Y, test_size=0.2, random_state=12, stratify=Y)

classifier.fit(X_train, y_train)
print("model score: %.3f" % classifier.score(X_test, y_test))  # Prints accuracy of 0.78

text_representation = tree.export_text(classifier)

尽管模型已安装,最后一个命令还是会产生此错误(我认为这是同步情况,但不知道如何解决它):

sklearn.exceptions.NotFittedError: This Pipeline instance is not fitted yet. Call 'fit' with appropriate arguments before using this estimator.

您不能使用export_text在整个管道上起作用,因为它只接受决策树对象,即DecisionTreeClassifier or DecisionTreeRegressor。仅通过管道的拟合估算器即可工作:

text_representation = tree.export_text(classifier['classifier'])

错误消息指出Pipeline对象未安装是由于check_is_fitted https://github.com/scikit-learn/scikit-learn/blob/15a949460dbf19e5e196b8ef48f9712b72a3b3c3/sklearn/utils/validation.py#L1035的函数scikit-learn。它的工作原理是检查估计器上是否存在拟合属性(以下划线结尾)。自从Pipeline如果对象不公开此类属性,则检查失败并引发错误,尽管它确实已安装。但这不是问题,因为Pipeline无论如何,对象不应该以这种方式使用。

本文内容由网友自发贡献,版权归原作者所有,本站不承担相应法律责任。如您发现有涉嫌抄袭侵权的内容,请联系:hwhale#tublm.com(使用前将#替换为@)

sklearn“管道实例尚未安装。”错误,尽管它是 的相关文章

随机推荐