目录
文章目录
- Jensen's inequality
- 讲解KL散度(又名relative entropy)
- mutual information
Jensen’s inequality
-
f
(
∫
x
p
(
x
)
d
x
)
⩽
∫
f
(
x
)
p
(
x
)
d
x
f(\int\mathrm{x}p(x)dx)\leqslant\int\mathbb{f}(x)p(x)dx
f(∫xp(x)dx)⩽∫f(x)p(x)dx,根据
f
(
E
(
x
)
)
⩽
E
(
f
(
x
)
)
f(E(x))\leqslant\mathbb{E}(f(x))
f(E(x))⩽E(f(x))Jensen’s inequality推。
-
K
L
(
p
∥
q
)
=
−
∫
p
(
x
)
ln
{
q
(
x
)
p
(
x
)
}
d
x
⩾
−
ln
∫
q
(
x
)
d
x
=
0
\mathrm{KL}(p \| q)=-\int p(\mathbf{x}) \ln \left\{\frac{q(\mathbf{x})}{p(\mathbf{x})}\right\} \mathrm{d} \mathbf{x} \geqslant-\ln \int q(\mathbf{x}) \mathrm{d} \mathbf{x}=0
KL(p∥q)=−∫p(x)ln{p(x)q(x)}dx⩾−ln∫q(x)dx=0,只有当
p
(
x
)
p(x)
p(x),
q
(
x
)
q(x)
q(x)相等时等号成立。
讲解KL散度(又名relative entropy)
- 定义
K
L
(
p
∥
q
)
=
−
∫
p
(
x
)
ln
{
q
(
x
)
p
(
x
)
}
d
x
\mathrm{KL}(p \| q)=-\int p(\mathbf{x}) \ln \left\{\frac{q(\mathbf{x})}{p(\mathbf{x})}\right\} \mathrm{d} \mathbf{x}
KL(p∥q)=−∫p(x)ln{p(x)q(x)}dx
-
−
l
n
x
-lnx
−lnx是严格的凸函数,由Jensen’s inequality有
K
L
(
p
∥
q
)
=
−
∫
p
(
x
)
ln
{
q
(
x
)
p
(
x
)
}
d
x
⩾
−
ln
∫
q
(
x
)
d
x
=
0
\mathrm{KL}(p \| q)=-\int p(\mathbf{x}) \ln \left\{\frac{q(\mathbf{x})}{p(\mathbf{x})}\right\} \mathrm{d} \mathbf{x} \geqslant-\ln \int q(\mathbf{x}) \mathrm{d} \mathbf{x}=0
KL(p∥q)=−∫p(x)ln{p(x)q(x)}dx⩾−ln∫q(x)dx=0
- 在实际应用中
K
L
(
p
∥
q
)
≃
∑
n
=
1
N
{
−
ln
q
(
x
n
∣
θ
)
+
ln
p
(
x
n
)
}
\mathrm{KL}(p \| q) \simeq \sum_{n=1}^{N}\left\{-\ln q\left(\mathbf{x}_{n} | \boldsymbol{\theta}\right)+\ln p\left(\mathbf{x}_{n}\right)\right\}
KL(p∥q)≃∑n=1N{−lnq(xn∣θ)+lnp(xn)}
- 注释:对于前面KL定义可知用的样本点服从
p
(
x
)
p(x)
p(x),故原来积分可等于上式,例如
E
(
x
)
=
∫
x
f
(
x
)
d
x
≃
1
N
∑
f
(
x
i
)
E(x)=\int\mathrm{x}f(x)dx\simeq\frac{1}{N}\sum\mathrm{f}(x_{i})
E(x)=∫xf(x)dx≃N1∑f(xi),重要性采样等方法都用到这个方法。
mutual information
1.如果数据集变量x与y不独立,就考虑
p
(
x
)
p
(
y
)
p(x)p(y)
p(x)p(y)去近似,就可得到mutual information:
I
[
x
,
y
]
≡
K
L
(
p
(
x
,
y
)
∥
p
(
x
)
p
(
y
)
)
=
−
∬
p
(
x
,
y
)
ln
(
p
(
x
)
p
(
y
)
p
(
x
,
y
)
)
d
x
d
y
\begin{aligned} \mathrm{I}[\mathbf{x}, \mathbf{y}] & \equiv \mathrm{KL}(p(\mathbf{x},\mathbf{y})\|p(\mathbf{x})p(\mathbf{y})) \\ &=-\iint p(\mathbf{x}, \mathbf{y})\ln\left(\frac{p(\mathbf{x}) p(\mathbf{y})}{p(\mathbf{x}, \mathbf{y})}\right) \mathrm{d} \mathbf{x} \mathrm{d} \mathbf{y} \end{aligned}
I[x,y]≡KL(p(x,y)∥p(x)p(y))=−∬p(x,y)ln(p(x,y)p(x)p(y))dxdy
2.利用概率的和法则和乘积法则,可以得出互信息与条件熵的关系:
I
[
x
,
y
]
=
H
[
x
]
−
H
[
x
∣
y
]
=
H
[
y
]
−
H
[
y
∣
x
]
\mathrm{I}[\mathbf{x}, \mathbf{y}]=\mathrm{H}[\mathbf{x}]-\mathrm{H}[\mathbf{x} | \mathbf{y}]=\mathrm{H}[\mathbf{y}]-\mathrm{H}[\mathbf{y} | \mathbf{x}]
I[x,y]=H[x]−H[x∣y]=H[y]−H[y∣x]
本文内容由网友自发贡献,版权归原作者所有,本站不承担相应法律责任。如您发现有涉嫌抄袭侵权的内容,请联系:hwhale#tublm.com(使用前将#替换为@)