我想在人口普查数据集中执行 one-hot 编码:
https://archive.ics.uci.edu/ml/datasets/census+venue
我想要执行的列位于国家/地区列中,因此我做了以下操作:
import pandas as pd
from sklearn import preprocessing
def abrirArchivo(fileR):
head=["gt lt 50","age","workclass","fnlwgt","edu","edu-num","mar-sta","occ","rela","race","sex","cap-gain","cap-loss","country","hpw"]
f=pd.read_csv(fileR,sep=',')
f.columns=head
ohe=oneHot(f)
print (ohe)
def oneHot(f):
f[["country"]]=pd.get_dummies(f[["country"]])
return f
但我收到一个错误:
ValueError: Columns must be same length as key
当我进行序数编码时,以下代码没有问题:
pp=preprocessing.OrdinalEncoder()
f[["country"]]=pp.fit_transform(f[["country"]])
我想要的是将转换后的 ohe(虚拟变量)连接到我原始的 panda 数据框,以便将其用于分类模型。
有什么帮助吗?