在使用 statsmodels 时,我收到这个奇怪的错误:ValueError: endog must be in the unit interval.
有人可以给我有关此错误的更多信息吗?谷歌没有提供帮助。
产生错误的代码:
"""
Multiple regression with dummy variables.
"""
import pandas as pd
import statsmodels.api as sm
import pylab as pl
import numpy as np
df = pd.read_csv('cost_data.csv')
df.columns = ['Cost', 'R(t)', 'Day of Week']
dummy_ranks = pd.get_dummies(df['Day of Week'], prefix='days')
cols_to_keep = ['Cost', 'R(t)']
data = df[cols_to_keep].join(dummy_ranks.ix[:,'days_2':])
data['intercept'] = 1.0
print(data)
train_cols = data.columns[1:]
logit = sm.Logit(data['Cost'], data[train_cols])
result = logit.fit()
print(result.summary())
和回溯:
Traceback (most recent call last):
File "multiple_regression_dummy.py", line 20, in <module>
logit = sm.Logit(data['Cost'], data[train_cols])
File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/statsmodels/discrete/discrete_model.py", line 404, in __init__
raise ValueError("endog must be in the unit interval.")
ValueError: endog must be in the unit interval.
当我的目标列的值大于 1 时,我收到此错误。
确保您的目标列介于 0 和 1 之间(逻辑回归所需),然后重试。
例如,如果目标列的值为 1-5,请将 4 和 5 设为正类,将 1,2,3 设为负类。希望这可以帮助。
本文内容由网友自发贡献,版权归原作者所有,本站不承担相应法律责任。如您发现有涉嫌抄袭侵权的内容,请联系:hwhale#tublm.com(使用前将#替换为@)