我有熊猫数据框df
。我想对连续和分类特征进行编码df
使用不同的编码器。我觉得用起来很舒服make_column_transformer
,但是下面显示的代码失败了LabelEncoder()
,但可以很好地与OneHotEncoder(handle_unknown='ignore'))
。错误信息是:
类型错误:fit_transform() 需要 2 个位置参数,但 3 个是
给定
我不清楚如何解决这个问题。
代码:
from sklearn.compose import make_column_transformer
from sklearn.preprocessing import RobustScaler, OneHotEncoder, LabelEncoder
continuous_features = ['COL1','COL2']
categorical_features = ['COL3','COL4']
column_trans = make_column_transformer(
(categorical_features,LabelEncoder()),
(continuous_features, RobustScaler()))
X_enc = column_trans.fit_transform(df)
根据https://scikit-learn.org/stable/modules/ generated/sklearn.compose.make_column_transformer.html https://scikit-learn.org/stable/modules/generated/sklearn.compose.make_column_transformer.html.
make_column_transformer(
... (StandardScaler(), ['numerical_column']),
... (OneHotEncoder(), ['categorical_column']))
所以对于你的情况:
from sklearn.compose import make_column_transformer
from sklearn.preprocessing import RobustScaler, OneHotEncoder, LabelEncoder
continuous_features = ['COL1','COL2']
categorical_features = ['COL3','COL4']
column_trans = make_column_transformer(
(OneHotEncoder(), categorical_features),
(RobustScaler(), continuous_features))
X_enc = column_trans.fit_transform(df)
如果你想使用LabelEncoder()
,你只能通过一列,不能通过两列!
希望这可以帮助。
本文内容由网友自发贡献,版权归原作者所有,本站不承担相应法律责任。如您发现有涉嫌抄袭侵权的内容,请联系:hwhale#tublm.com(使用前将#替换为@)