您需要先估算缺失值。您可以定义一个Pipeline https://scikit-learn.org/stable/modules/generated/sklearn.pipeline.Pipeline.html使用插补步骤SimpleImputer https://scikit-learn.org/stable/modules/generated/sklearn.impute.SimpleImputer.html设置一个constant
在 OneHot 编码之前为空字段输入新类别的策略:
from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import OneHotEncoder
from sklearn.impute import SimpleImputer
from sklearn.pipeline import Pipeline
import numpy as np
categorical_transformer = Pipeline(steps=[
('imputer', SimpleImputer(strategy='constant', fill_value='missing')),
('encoder', OneHotEncoder(handle_unknown='ignore'))])
preprocessor = ColumnTransformer(
transformers=[
('cat', categorical_transformer, [0])
])
df = pd.DataFrame(['Male', 'Female', np.nan])
preprocessor.fit_transform(df)
array([[0., 1., 0.],
[1., 0., 0.],
[0., 0., 1.]])