而不是替换“#DIV/0!”手动强制数据为数字。这同时做了两件事:它确保结果是数字类型(而不是 str),并且它替换NaN
对于任何无法解析为数字的条目。例子:
In [5]: Series([1, 2, 'blah', '#DIV/0!']).convert_objects(convert_numeric=True)
Out[5]:
0 1
1 2
2 NaN
3 NaN
dtype: float64
这应该可以解决您的错误。但是,在将直线拟合到数据的一般主题上,我有两种比 polyfit 更喜欢的方法。两者中的第二个更强大(并且可以返回有关统计数据的更详细信息),但它需要统计模型。
from scipy.stats import linregress
def fit_line1(x, y):
"""Return slope, intercept of best fit line."""
# Remove entries where either x or y is NaN.
clean_data = pd.concat([x, y], 1).dropna(0) # row-wise
(_, x), (_, y) = clean_data.iteritems()
slope, intercept, r, p, stderr = linregress(x, y)
return slope, intercept # could also return stderr
import statsmodels.api as sm
def fit_line2(x, y):
"""Return slope, intercept of best fit line."""
X = sm.add_constant(x)
model = sm.OLS(y, X, missing='drop') # ignores entires where x or y is NaN
fit = model.fit()
return fit.params[1], fit.params[0] # could also return stderr in each via fit.bse
要绘制它,请执行类似的操作
m, b = fit_line2(x, y)
N = 100 # could be just 2 if you are only drawing a straight line...
points = np.linspace(x.min(), x.max(), N)
plt.plot(points, m*points + b)