一元线性回归分析的残差诊断

2023-11-01

1973年安斯库姆(Anscombe)构造了四组数据,用这四组数据得到的经验回归方程是相同的,
都是y=3.00+0.500x,
决定系数都是,r2=0.667,
相关系数r=0.816。
这四组数据所建的回归方程是相同的,决定系数r2,F统计量也都相同,
且均通过显著性检验,说明这四组数据y与x之间都有显著的线性相关关系。
然而,变量y与x之间是否就有相同的线性相关关系呢?
由上述四组数据的散点图(见图2.7)可以看到,变量y与x之间的关系是很不相同的。

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import statsmodels.api as sm
import statsmodels.formula.api as smf

df = pd.read_json( '{"x1":{"0":4,"1":5,"2":6,"3":7,"4":8,"5":9,"6":10,"7":11,"8":12,"9":13,"10":14},"y1":{"0":4.26,"1":5.68,"2":7.24,"3":4.82,"4":6.95,"5":8.81,"6":8.04,"7":8.33,"8":10.84,"9":7.58,"10":9.96},"x2":{"0":4,"1":5,"2":6,"3":7,"4":8,"5":9,"6":10,"7":11,"8":12,"9":13,"10":14},"y2":{"0":3.1,"1":4.74,"2":6.13,"3":7.26,"4":8.14,"5":8.77,"6":9.14,"7":9.26,"8":9.13,"9":8.74,"10":8.1},"x3":{"0":4,"1":5,"2":6,"3":7,"4":8,"5":9,"6":10,"7":11,"8":12,"9":13,"10":14},"y3":{"0":5.39,"1":5.73,"2":6.08,"3":6.44,"4":6.77,"5":7.11,"6":7.46,"7":7.81,"8":8.15,"9":12.74,"10":8.84},"x4":{"0":8,"1":8,"2":8,"3":8,"4":8,"5":8,"6":8,"7":8,"8":8,"9":8,"10":19},"y4":{"0":6.58,"1":5.76,"2":7.71,"3":8.84,"4":8.47,"5":7.04,"6":5.25,"7":5.56,"8":7.91,"9":6.89,"10":12.5}}')
df.keys()
df.head()

model1 = smf.ols("y1 ~ x1 " , data=df).fit()
model2 = smf.ols("y2 ~ x2 " , data=df).fit()
model3 = smf.ols("y3 ~ x3 " , data=df).fit()
model4 = smf.ols("y4 ~ x4 " , data=df).fit()

model1.summary()
model2.summary()
model3.summary()
model4.summary()

fig = plt.figure()

ax1 = fig.add_subplot(2, 2, 1)
ax1.plot(df['x1'], df['y1'], 'ro')
ax1.plot(df['x1'], model1.predict(), 'g-')

ax2 = fig.add_subplot(2, 2, 2)
ax2.plot(df['x2'], df['y2'], 'ro')
ax2.plot(df['x2'], model2.predict(), 'g-')


ax3 = fig.add_subplot(2, 2, 3)
ax3.plot(df['x3'], df['y3'], 'ro')
ax3.plot(df['x3'], model3.predict(), 'g-')


ax4 = fig.add_subplot(2, 2, 4)
ax4.plot(df['x4'], df['y4'], 'ro')
ax4.plot(df['x4'], model4.predict(), 'g-')

plt.tight_layout()

在这里插入图片描述

#########################
########################绘制残差图
#########################
fig = plt.figure()

ax = fig.add_subplot(2, 2, 1)
sns.residplot(x="x1", y= "y1", data=df)

ax = fig.add_subplot(2, 2, 2)
sns.residplot(x="x2", y = "y2", data=df)

ax = fig.add_subplot(2, 2, 3)
sns.residplot(x = "x3", y = "y3", data=df)

ax = fig.add_subplot(2, 2, 4)
sns.residplot(x = "x4", y = "y4", data=df)

在这里插入图片描述

本文内容由网友自发贡献,版权归原作者所有,本站不承担相应法律责任。如您发现有涉嫌抄袭侵权的内容,请联系:hwhale#tublm.com(使用前将#替换为@)

一元线性回归分析的残差诊断 的相关文章

随机推荐