我正在运行以下脚本:
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
from sklearn.preprocessing import OneHotEncoder
dataset = pd.read_csv('data/50_Startups.csv')
X = dataset.iloc[:, :-1].values
y = dataset.iloc[:, 4].values
onehotencoder = OneHotEncoder(categorical_features=3,
handle_unknown='ignore')
onehotencoder.fit(X)
数据头看起来像:data https://i.stack.imgur.com/mEkVF.png
我有这个:
ValueError:无法将字符串转换为浮点数:'New York'
我阅读了以下问题的答案similar https://stackoverflow.com/questions/8420143/valueerror-could-not-convert-string-to-float-id提问然后打开 scikit-learn文档 https://scikit-learn.org/stable/modules/preprocessing.html#encoding-categorical-features,但是你如何才能看到 scikit-learn 作者没有字符串中的空格问题
我知道我可以使用LabelEncocder
from sklearn.preprocessing
然后使用 OHE 效果很好,但在这种情况下
In case you used a LabelEncoder before this OneHotEncoder to convert the categories to integers, then you can now use the OneHotEncoder directly.
warnings.warn(msg, FutureWarning)
发生按摩。
您可以使用完整的 csv 文件 https://pastebin.com/RtwkfsHJ or
[[165349.2, 136897.8, 471784.1, 'New York', 192261.83],
[162597.7, 151377.59, 443898.53, 'California', 191792.06],
[153441.51, 101145.55, 407934.54, 'Florida', 191050.39],
[144372.41, 118671.85, 383199.62, 'New York', 182901.99],
[142107.34, 91391.77, 366168.42, 'Florida', 166187.94]]
5 第一行测试此代码。