Understanding[1] 中讲 variational diffusion models(VDM)的 evidence lower bound(ELBO)推导时,(53) 式有一个容易引起误会的记号:
…
=
E
q
(
x
1
:
T
∣
x
0
)
[
log
p
(
x
T
)
p
θ
(
x
0
∣
x
1
)
q
(
x
1
∣
x
0
)
+
log
∏
t
=
2
T
p
θ
(
x
t
−
1
∣
x
t
)
q
(
x
t
−
1
∣
x
t
,
x
0
)
q
(
x
t
∣
x
0
)
q
(
x
t
−
1
∣
x
0
)
]
(
53
)
=
E
q
(
x
1
:
T
∣
x
0
)
[
log
p
(
x
T
)
p
θ
(
x
0
∣
x
1
)
q
(
x
1
∣
x
0
)
+
log
q
(
x
1
∣
x
0
)
q
(
x
T
∣
x
0
)
+
log
∏
t
=
2
T
p
θ
(
x
t
−
1
∣
x
t
)
q
(
x
t
−
1
∣
x
t
,
x
0
)
]
(
54
)
\begin{aligned} \dots &= \mathbb{E}_{q\left(\boldsymbol{x}_{1: T} \mid \boldsymbol{x}_0\right)}\left[\log \frac{p\left(\boldsymbol{x}_T\right) p_{\boldsymbol{\theta}}\left(\boldsymbol{x}_0 \mid \boldsymbol{x}_1\right)}{q\left(\boldsymbol{x}_1 \mid \boldsymbol{x}_0\right)}+\log \prod_{t=2}^T \frac{p_{\boldsymbol{\theta}}\left(\boldsymbol{x}_{t-1} \mid \boldsymbol{x}_t\right)}{\frac{q\left(\boldsymbol{x}_{t-1} \mid \boldsymbol{x}_t, \boldsymbol{x}_0\right) \cancel{q\left(\boldsymbol{x}_t \mid \boldsymbol{x}_0\right)}}{\cancel{q\left(\boldsymbol{x}_{t-1} \mid \boldsymbol{x}_0\right)}} }\right] & (53) \\ &= \mathbb{E}_{q\left(\boldsymbol{x}_{1: T} \mid \boldsymbol{x}_0\right)}\left[\log \frac{p\left(\boldsymbol{x}_T\right) p_{\boldsymbol{\theta}}\left(\boldsymbol{x}_0 \mid \boldsymbol{x}_1\right)}{\cancel{q\left(\boldsymbol{x}_1 \mid \boldsymbol{x}_0\right)}}+\log \frac{\cancel{q\left(\boldsymbol{x}_1 \mid \boldsymbol{x}_0\right)}}{q\left(\boldsymbol{x}_T \mid \boldsymbol{x}_0\right)}+\log \prod_{t=2}^T \frac{p_{\boldsymbol{\theta}}\left(\boldsymbol{x}_{t-1} \mid \boldsymbol{x}_t\right)}{q\left(\boldsymbol{x}_{t-1} \mid \boldsymbol{x}_t, \boldsymbol{x}_0\right)}\right] &(54) \end{aligned}
…=Eq(x1:T∣x0)logq(x1∣x0)p(xT)pθ(x0∣x1)+logt=2∏Tq(xt−1∣x0)q(xt−1∣xt,x0)q(xt∣x0)pθ(xt−1∣xt)=Eq(x1:T∣x0)[logq(x1∣x0)p(xT)pθ(x0∣x1)+logq(xT∣x0)q(x1∣x0)+logt=2∏Tq(xt−1∣xt,x0)pθ(xt−1∣xt)](53)(54)
其中 (53) 式第二项看起来像是消项,但显然两者并不相等!而明显 (54) 中第二项显然非零,却不知从何冒出来。听 [2] 讲到「拆开」、看 [3] 的评论问答才知道:这个推导是将 (53) 式「划掉」的部分拿出来单独放在一项
log
\log
log 中,而这一项内部可以消项,消剩的就是 (54) 式中的第二项,即:
(53)第二项
=
log
∏
t
=
2
T
p
θ
(
x
t
−
1
∣
x
t
)
q
(
x
t
−
1
∣
x
t
,
x
0
)
q
(
x
t
∣
x
0
)
q
(
x
t
−
1
∣
x
0
)
=
log
∏
t
=
2
T
[
p
θ
(
x
t
−
1
∣
x
t
)
q
(
x
t
−
1
∣
x
t
,
x
0
)
⋅
q
(
x
t
−
1
∣
x
0
)
q
(
x
t
∣
x
0
)
]
=
log
∏
t
=
2
T
p
θ
(
x
t
−
1
∣
x
t
)
q
(
x
t
−
1
∣
x
t
,
x
0
)
⏟
(54) 第三项
+
log
∏
t
=
2
T
q
(
x
t
−
1
∣
x
0
)
q
(
x
t
∣
x
0
)
=
log
[
q
(
x
1
∣
x
0
)
q
(
x
2
∣
x
0
)
×
q
(
x
2
∣
x
0
)
q
(
x
3
∣
x
0
)
×
⋯
×
q
(
x
T
−
1
∣
x
0
)
q
(
x
T
∣
x
0
)
]
+
(54) 第三项
=
log
q
(
x
1
∣
x
0
)
q
(
x
T
∣
x
0
)
⏟
(54) 第二项
+
(54) 第三项
\begin{aligned} \text{(53)第二项} &= \log \prod_{t=2}^T \frac{p_{\theta}\left(x_{t-1} \mid x_t\right)}{\frac{q\left(x_{t-1} \mid x_t, x_0\right)q\left(x_t \mid x_0\right)}{q\left(x_{t-1} \mid x_0\right)} } \\ &= \log \prod_{t=2}^T \left[\frac{p_{\theta}\left(x_{t-1} \mid x_t\right)}{q\left(x_{t-1} \mid x_t, x_0\right)} \cdot \frac{q\left(x_{t-1} \mid x_0\right)}{q\left(x_t \mid x_0\right)}\right] \\ &= \underbrace{\log \prod_{t=2}^T \frac{p_{\theta}\left(x_{t-1} \mid x_t\right)}{q\left(x_{t-1} \mid x_t, x_0\right)}}_{\text{(54) 第三项}} + \log \prod_{t=2}^T \frac{q\left(x_{t-1} \mid x_0\right)}{q\left(x_t \mid x_0\right)} \\ &= \log \left[ \frac{q\left(x_1 \mid x_0\right)}{\cancel{q\left(x_2 \mid x_0\right)}} \times \frac{\cancel{q\left(x_2 \mid x_0\right)}}{\cancel{q\left(x_3 \mid x_0\right)}} \times \cdots \times \frac{\cancel{q\left(x_{T-1} \mid x_0\right)}}{q\left(x_T \mid x_0\right)}\right] + \text{(54) 第三项} \\ &= \underbrace{\log \frac{q\left(x_1 \mid x_0\right)}{q\left(x_T \mid x_0\right)}}_{\text{(54) 第二项}} + \text{(54) 第三项} \end{aligned}
(53)第二项=logt=2∏Tq(xt−1∣x0)q(xt−1∣xt,x0)q(xt∣x0)pθ(xt−1∣xt)=logt=2∏T[q(xt−1∣xt,x0)pθ(xt−1∣xt)⋅q(xt∣x0)q(xt−1∣x0)]=(54) 第三项logt=2∏Tq(xt−1∣xt,x0)pθ(xt−1∣xt)+logt=2∏Tq(xt∣x0)q(xt−1∣x0)=log[q(x2∣x0)q(x1∣x0)×q(x3∣x0)q(x2∣x0)×⋯×q(xT∣x0)q(xT−1∣x0)]+(54) 第三项=(54) 第二项logq(xT∣x0)q(x1∣x0)+(54) 第三项
DDIM
DDIM[7] 在其第 2 节 background 回顾 DDPM[5] 时,(3) 式
q
(
x
t
∣
x
t
−
1
)
q(x_t|x_{t-1})
q(xt∣xt−1) 的形式与 Understanding[1] 的 (31) 式、DDPM 的 (2) 式都不同,但对照其 (3) 式下面的公式(无标号那条) 与 Understanding 的 (70)、DDPM 的 (4) 可知:DDIM 中的
α
t
\alpha_t
αt 其实对应 Understanding / DDPM 中的
α
ˉ
t
\bar{\alpha}_t
αˉt。其 appendix C.2 也有明确讲到这点。
References
Understanding Diffusion Models: A Unified Perspective - arXiv, blog(abbr: Understanding)