关于因果关系的识别,前面介绍了一些方法:随机对照试验、后门调整、前门调整、do-演算。今天介绍另一种进行因果效应识别的另一种方法:工具变量。
1. 什么是工具变量?
上面的因果图中,
Z
Z
Z就是一个工具变量,可以利用它在
U
U
U观测不到的情况下计算
T
T
T对
Y
Y
Y的因果效应。
工具变量的标准:
- (Relevance)
Z
Z
Z是
T
T
T的直接原因。
- (Exclusion Restriction)
Z
Z
Z对
Y
Y
Y的因果效应由
T
T
T完全介导。
- (Instrumental Unconfoundedness)
Z
Z
Z到
Y
Y
Y没有畅通无阻的后门路径。
一个变量满足上述工具变量标准,才能成为一个工具变量,才能利用其进行因果关系的识别。
2. 工具变量不能进行ATE的无参识别
与第五章介绍的ATE识别的方法相比,工具变量无法对ATE进行无参识别。当使用后门调整、前门调整和do-演算识别ATE时,我们无需对参数形式或者说结构变量做任何假设,但工具变量对ATE的识别必须建立在对参数形式(例如,线性)的假设之上。
因果效应可以被无参识别的必要条件是:
T
T
T到其后代也是
Y
Y
Y的祖先的节点
M
M
M的后门路径是可以被阻断的(见第五章最后)。在上面的因果图中,
Y
Y
Y是
T
T
T的子代,并将其看成自己的祖先,
T
T
T到
Y
Y
Y的后门路径
T
→
U
→
Y
T→U→Y
T→U→Y由于
U
U
U观测不到而无法被阻断,因此
T
T
T对
Y
Y
Y的因果效应不能被无参识别。
3. 二元线性设置下工具变量识别ATE
首先介绍最简单的情况:二元线性设置下用工具变量识别ATE。
假设:
Y
:
=
δ
T
+
α
u
U
Y:=\delta T+\alpha{_u}U
Y:=δT+αuU,假设
T
T
T和
Z
Z
Z都是二元变量。
根据假设,
T
T
T对
Y
Y
Y的因果效应为
δ
\delta
δ,我们需要根据
Z
Z
Z估计
δ
\delta
δ的值。
E
[
Y
∣
Z
=
1
]
−
E
[
Y
∣
Z
=
0
]
=
E
[
δ
T
+
α
u
U
∣
Z
=
1
]
−
E
[
δ
T
+
α
u
U
∣
Z
=
0
]
=
δ
(
E
[
T
∣
Z
=
1
]
−
E
[
T
∣
Z
=
0
]
)
+
α
u
(
E
[
U
∣
Z
=
1
]
−
E
[
U
∣
Z
=
0
]
)
=
δ
(
E
[
T
∣
Z
=
1
]
−
E
[
T
∣
Z
=
0
]
)
\mathbb{E}[Y \mid Z=1]-\mathbb{E}[Y \mid Z=0]\\ \quad=\mathbb{E}\left[\delta T+\alpha_{u} U \mid Z=1\right]-\mathbb{E}\left[\delta T+\alpha_{u} U \mid Z=0\right]\\ \quad=\delta (\mathbb{E}[T|Z=1]-\mathbb{E}[T|Z=0])+\alpha_{u}(\mathbb{E}[U|Z=1]-\mathbb{E}[U|Z=0])\\ \quad=\delta (\mathbb{E}[T|Z=1]-\mathbb{E}[T|Z=0])
E[Y∣Z=1]−E[Y∣Z=0]=E[δT+αuU∣Z=1]−E[δT+αuU∣Z=0]=δ(E[T∣Z=1]−E[T∣Z=0])+αu(E[U∣Z=1]−E[U∣Z=0])=δ(E[T∣Z=1]−E[T∣Z=0])
在计算过程中把
E
[
T
∣
Z
=
0
]
)
+
α
u
(
E
[
U
∣
Z
=
1
]
−
E
[
U
∣
Z
=
0
]
)
\mathbb{E}[T|Z=0])+\alpha_{u}(\mathbb{E}[U|Z=1]-\mathbb{E}[U|Z=0])
E[T∣Z=0])+αu(E[U∣Z=1]−E[U∣Z=0])直接删掉是因为根据工具变量标准的第三条,
Z
Z
Z与
U
U
U是相互独立的。于是,
δ
=
E
[
Y
∣
Z
=
1
]
−
E
[
Y
∣
Z
=
0
]
E
[
T
∣
Z
=
1
]
−
E
[
T
∣
Z
=
0
]
\delta=\frac{\mathbb{E}[Y \mid Z=1]-\mathbb{E}[Y \mid Z=0]}{\mathbb{E}[T \mid Z=1]-\mathbb{E}[T \mid Z=0]}
δ=E[T∣Z=1]−E[T∣Z=0]E[Y∣Z=1]−E[Y∣Z=0]注意,根据工具变量标准的第一条,上式中的分母不为零。
然后,我们插入经验平均值来代替这些条件期望值,得到Wald估计量:
δ
^
=
1
n
1
∑
i
:
z
i
=
1
Y
i
−
1
n
0
∑
i
:
z
i
=
0
Y
i
1
n
1
∑
i
:
z
i
=
1
T
i
−
1
n
0
∑
i
:
z
i
=
0
T
i
\hat{\delta}=\frac{\frac{1}{n_{1}} \sum_{i: z_{i}=1} Y_{i}-\frac{1}{n_{0}} \sum_{i: z_{i}=0} Y_{i}}{\frac{1}{n_{1}} \sum_{i: z_{i}=1} T_{i}-\frac{1}{n_{0}} \sum_{i: z_{i}=0} T_{i}}
δ^=n11∑i:zi=1Ti−n01∑i:zi=0Tin11∑i:zi=1Yi−n01∑i:zi=0Yi其中,
n
1
n_1
n1是
z
=
1
z=1
z=1的样本数,
n
2
n_2
n2是
z
=
0
z=0
z=0的样本数。
4. 连续线性设置下工具变量识别ATE
同样假设
Y
:
=
δ
T
+
α
u
U
Y:=\delta T+\alpha{_u}U
Y:=δT+αuU,当
T
T
T和
Z
Z
Z都是连续的的时候,
δ
=
Cov
(
Y
,
Z
)
Cov
(
T
,
Z
)
\delta=\frac{\operatorname{Cov}(Y, Z)}{\operatorname{Cov}(T, Z)}
δ=Cov(T,Z)Cov(Y,Z)证明:
Cov
(
Y
,
Z
)
=
E
[
Y
Z
]
−
E
[
Y
]
E
[
Z
]
=
E
[
(
δ
T
+
α
u
U
)
Z
]
−
E
[
δ
T
+
α
u
U
]
E
[
Z
]
=
δ
E
[
T
Z
]
+
α
u
E
[
U
Z
]
−
δ
E
[
T
]
E
[
Z
]
−
α
u
E
[
U
]
E
[
Z
]
=
δ
(
E
[
T
Z
]
−
E
[
T
]
E
[
Z
]
)
+
α
u
(
E
[
U
Z
]
−
E
[
U
]
E
[
Z
]
)
=
δ
Cov
(
T
,
Z
)
+
α
u
Cov
(
U
,
Z
)
=
δ
Cov
(
T
,
Z
)
\begin{aligned} \operatorname{Cov}(Y, Z) &=\mathbb{E}[Y Z]-\mathbb{E}[Y] \mathbb{E}[Z] \\ &=\mathbb{E}\left[\left(\delta T+\alpha_{u} U\right) Z\right]-\mathbb{E}\left[\delta T+\alpha_{u} U\right] \mathbb{E}[Z] \\ &=\delta \mathbb{E}[T Z]+\alpha_{u} \mathbb{E}[U Z]-\delta \mathbb{E}[T] \mathbb{E}[Z]-\alpha_{u} \mathbb{E}[U] \mathbb{E}[Z] \\ &=\delta(\mathbb{E}[T Z]-\mathbb{E}[T] \mathbb{E}[Z])+\alpha_{u}(\mathbb{E}[U Z]-\mathbb{E}[U] \mathbb{E}[Z]) \\ &=\delta \operatorname{Cov}(T, Z)+\alpha_{u} \operatorname{Cov}(U, Z) \\ &=\delta \operatorname{Cov}(T, Z) \end{aligned}
Cov(Y,Z)=E[YZ]−E[Y]E[Z]=E[(δT+αuU)Z]−E[δT+αuU]E[Z]=δE[TZ]+αuE[UZ]−δE[T]E[Z]−αuE[U]E[Z]=δ(E[TZ]−E[T]E[Z])+αu(E[UZ]−E[U]E[Z])=δCov(T,Z)+αuCov(U,Z)=δCov(T,Z)
5. 工具变量无参识别局部ATE
根据Principal Strata,将数据分为四层:
- Compliers -
T
(
1
)
=
1
a
n
d
T
(
0
)
=
0
T(1)=1\ and\ T(0)=0
T(1)=1 and T(0)=0
- Always-takers -
T
(
1
)
=
1
a
n
d
T
(
0
)
=
1
T(1)=1\ and\ T(0)=1
T(1)=1 and T(0)=1
- Never-takers -
T
(
1
)
=
0
a
n
d
T
(
0
)
=
0
T(1)=0\ and\ T(0)=0
T(1)=0 and T(0)=0
- Defiers -
T
(
1
)
=
0
a
n
d
T
(
0
)
=
1
T(1)=0\ and\ T(0)=1
T(1)=0 and T(0)=1
Local ATE(LATE, Complier Average Causal Effect (CACE))的定义:
E
[
Y
(
T
=
1
)
−
Y
(
T
=
0
)
∣
T
(
Z
=
1
)
=
1
,
T
(
Z
=
0
)
=
0
]
\mathbb{E}[Y(T=1)-Y(T=0) \mid T(Z=1)=1, T(Z=0)=0]
E[Y(T=1)−Y(T=0)∣T(Z=1)=1,T(Z=0)=0]
Monotonicity假设:
∀
i
,
T
i
(
Z
=
1
)
≥
T
i
(
Z
=
0
)
\forall i, \quad T_{i}(Z=1) \geq T_{i}(Z=0)
∀i,Ti(Z=1)≥Ti(Z=0),这个假设的意思就是没有defier。
基于上面的假设,我们可以得到局部ATE:
E
[
Y
(
1
)
−
Y
(
0
)
∣
T
(
1
)
=
1
,
T
(
0
)
=
0
]
=
E
[
Y
∣
Z
=
1
]
−
E
[
Y
∣
Z
=
0
]
E
[
T
∣
Z
=
1
]
−
E
[
T
∣
Z
=
0
]
\mathbb{E}[Y(1)-Y(0) \mid T(1)=1, T(0)=0]=\frac{\mathbb{E}[Y \mid Z=1]-\mathbb{E}[Y \mid Z=0]}{\mathbb{E}[T \mid Z=1]-\mathbb{E}[T \mid Z=0]}
E[Y(1)−Y(0)∣T(1)=1,T(0)=0]=E[T∣Z=1]−E[T∣Z=0]E[Y∣Z=1]−E[Y∣Z=0]
证明:
按照Principal Strata将数据分为四层,
E
[
Y
(
Z
=
1
)
−
Y
(
Z
=
0
)
]
=
E
[
Y
(
Z
=
1
)
−
Y
(
Z
=
0
)
∣
T
(
1
)
=
1
,
T
(
0
)
=
0
]
P
(
T
(
1
)
=
1
,
T
(
0
)
=
0
)
+
E
[
Y
(
Z
=
1
)
−
Y
(
Z
=
0
)
∣
T
(
1
)
=
0
,
T
(
0
)
=
1
]
P
(
T
(
1
)
=
0
,
T
(
0
)
=
1
)
+
E
[
Y
(
Z
=
1
)
−
Y
(
Z
=
0
)
∣
T
(
1
)
=
1
,
T
(
0
)
=
1
]
P
(
T
(
1
)
=
1
,
T
(
0
)
=
1
)
+
E
[
Y
(
Z
=
1
)
−
Y
(
Z
=
0
)
∣
T
(
1
)
=
0
,
T
(
0
)
=
0
]
P
(
T
(
1
)
=
0
,
T
(
0
)
=
0
)
\begin{array}{l} \mathbb{E}[Y(Z=1)-Y(Z=0)] \\ \quad \begin{array}{l}= \mathbb{E}[Y(Z=1)-Y(Z=0) \mid T(1)=1, T(0)=0] P(T(1)=1, T(0)=0) \\ +\mathbb{E}[Y(Z=1)-Y(Z=0) \mid T(1)=0, T(0)=1] P(T(1)=0, T(0)=1) \\ +\mathbb{E}[Y(Z=1)-Y(Z=0) \mid T(1)=1, T(0)=1] P(T(1)=1, T(0)=1) \\ +\mathbb{E}[Y(Z=1)-Y(Z=0) \mid T(1)=0, T(0)=0] P(T(1)=0, T(0)=0) \end{array} \end{array}
E[Y(Z=1)−Y(Z=0)]=E[Y(Z=1)−Y(Z=0)∣T(1)=1,T(0)=0]P(T(1)=1,T(0)=0)+E[Y(Z=1)−Y(Z=0)∣T(1)=0,T(0)=1]P(T(1)=0,T(0)=1)+E[Y(Z=1)−Y(Z=0)∣T(1)=1,T(0)=1]P(T(1)=1,T(0)=1)+E[Y(Z=1)−Y(Z=0)∣T(1)=0,T(0)=0]P(T(1)=0,T(0)=0)对于always-takers和never-takers,
Z
Z
Z和
T
T
T是独立的,因此
Z
Z
Z对
Y
Y
Y也没有因果效应,因此消去它们。
=
E
[
Y
(
Z
=
1
)
−
Y
(
Z
=
0
)
∣
T
(
1
)
=
1
,
T
(
0
)
=
0
]
P
(
T
(
1
)
=
1
,
T
(
0
)
=
0
)
+
E
[
Y
(
Z
=
1
)
−
Y
(
Z
=
0
)
∣
T
(
1
)
=
0
,
T
(
0
)
=
1
]
P
(
T
(
1
)
=
0
,
T
(
0
)
=
1
)
\begin{aligned} \quad=& \mathbb{E}[Y(Z=1)-Y(Z=0) \mid T(1)=1, T(0)=0] P(T(1)=1, T(0)=0) \\ &+\mathbb{E}[Y(Z=1)-Y(Z=0) \mid T(1)=0, T(0)=1] P(T(1)=0, T(0)=1) \end{aligned}
=E[Y(Z=1)−Y(Z=0)∣T(1)=1,T(0)=0]P(T(1)=1,T(0)=0)+E[Y(Z=1)−Y(Z=0)∣T(1)=0,T(0)=1]P(T(1)=0,T(0)=1)根据假设,没有defiers,因此,消去它们。
=
E
[
Y
(
Z
=
1
)
−
Y
(
Z
=
0
)
∣
T
(
1
)
=
1
,
T
(
0
)
=
0
]
P
(
T
(
1
)
=
1
,
T
(
0
)
=
0
)
\begin{aligned} \quad=& \mathbb{E}[Y(Z=1)-Y(Z=0) \mid T(1)=1, T(0)=0] P(T(1)=1, T(0)=0) \end{aligned}
=E[Y(Z=1)−Y(Z=0)∣T(1)=1,T(0)=0]P(T(1)=1,T(0)=0)
现在,我们可以得到,对于compliers,
Z
Z
Z对
Y
Y
Y的因果效应为:
E
[
Y
(
Z
=
1
)
−
Y
(
Z
=
0
)
∣
T
(
1
)
=
1
,
T
(
0
)
=
0
]
=
E
[
Y
(
Z
=
1
)
−
Y
(
Z
=
0
)
]
P
(
T
(
1
)
=
1
,
T
(
0
)
=
0
)
\mathbb{E}[Y(Z=1)-Y(Z=0) \mid T(1)=1, T(0)=0]=\frac{\mathbb{E}[Y(Z=1)-Y(Z=0)]}{P(T(1)=1, T(0)=0)}
E[Y(Z=1)−Y(Z=0)∣T(1)=1,T(0)=0]=P(T(1)=1,T(0)=0)E[Y(Z=1)−Y(Z=0)]因为是compliers,所以
Y
(
Z
=
1
)
=
Y
(
T
=
1
)
Y(Z=1)=Y(T=1)
Y(Z=1)=Y(T=1)且
Y
(
Z
=
1
)
=
Y
(
T
=
1
)
Y(Z=1)=Y(T=1)
Y(Z=1)=Y(T=1),因此,
E
[
Y
(
T
=
1
)
−
Y
(
T
=
0
)
∣
T
(
1
)
=
1
,
T
(
0
)
=
0
]
=
E
[
Y
(
Z
=
1
)
−
Y
(
Z
=
0
)
]
P
(
T
(
1
)
=
1
,
T
(
0
)
=
0
)
=
E
[
Y
∣
Z
=
1
]
−
E
[
Y
∣
Z
=
0
]
P
(
T
(
1
)
=
1
,
T
(
0
)
=
0
)
\mathbb{E}[Y(T=1)-Y(T=0) \mid T(1)=1, T(0)=0]\\=\frac{\mathbb{E}[Y(Z=1)-Y(Z=0)]}{P(T(1)=1, T(0)=0)}\\ =\frac{\mathbb{E}[Y|Z=1]-\mathbb{E}[Y|Z=0]}{P(T(1)=1, T(0)=0)}
E[Y(T=1)−Y(T=0)∣T(1)=1,T(0)=0]=P(T(1)=1,T(0)=0)E[Y(Z=1)−Y(Z=0)]=P(T(1)=1,T(0)=0)E[Y∣Z=1]−E[Y∣Z=0]因为没有compliers,可以用总概率减去always-takes和never-takers得到compliers的概率。于是,
=
E
[
Y
∣
Z
=
1
]
−
E
[
Y
∣
Z
=
0
]
1
−
P
(
T
=
0
∣
Z
=
1
)
−
P
(
T
=
1
∣
Z
=
0
)
=
E
[
Y
∣
Z
=
1
]
−
E
[
Y
∣
Z
=
0
]
1
−
(
1
−
P
(
T
=
1
∣
Z
=
1
)
)
−
P
(
T
=
1
∣
Z
=
0
)
=
E
[
Y
∣
Z
=
1
]
−
E
[
Y
∣
Z
=
0
]
P
(
T
=
1
∣
Z
=
1
)
−
P
(
T
=
1
∣
Z
=
0
)
=
E
[
Y
∣
Z
=
1
]
−
E
[
Y
∣
Z
=
0
]
E
[
T
∣
Z
=
1
]
−
E
[
T
∣
Z
=
0
]
=\frac{\mathbb{E}[Y \mid Z=1]-\mathbb{E}[Y \mid Z=0]}{1-P(T=0 \mid Z=1)-P(T=1 \mid Z=0)}\\=\frac{\mathbb{E}[Y \mid Z=1]-\mathbb{E}[Y \mid Z=0]}{1-(1-P(T=1 \mid Z=1))-P(T=1 \mid Z=0)}\\=\frac{\mathbb{E}[Y \mid Z=1]-\mathbb{E}[Y \mid Z=0]}{P(T=1 \mid Z=1)-P(T=1 \mid Z=0)}\\=\frac{\mathbb{E}[Y \mid Z=1]-\mathbb{E}[Y \mid Z=0]}{\mathbb{E}[T \mid Z=1]-\mathbb{E}[T \mid Z=0]}
=1−P(T=0∣Z=1)−P(T=1∣Z=0)E[Y∣Z=1]−E[Y∣Z=0]=1−(1−P(T=1∣Z=1))−P(T=1∣Z=0)E[Y∣Z=1]−E[Y∣Z=0]=P(T=1∣Z=1)−P(T=1∣Z=0)E[Y∣Z=1]−E[Y∣Z=0]=E[T∣Z=1]−E[T∣Z=0]E[Y∣Z=1]−E[Y∣Z=0]
上面计算局部ATE的方法仅局限于:
- 线性
- 满足Monotonicity假设
6. ATE识别的更一般设置