Xavier初始化可以采用均匀分布 U(-a, a),其中a的计算公式为:
a
=
g
a
i
n
×
6
f
a
n
_
i
n
+
f
a
n
_
o
u
t
a = gain \times \sqrt[]{\frac{6}{fan\_in+fan\_out}}
a=gain×fan_in+fan_out6 Xavier初始化可以采用正态分布 N(0, std),其中std的计算公式为:
s
t
d
=
g
a
i
n
×
2
f
a
n
_
i
n
+
f
a
n
_
o
u
t
std = gain \times \sqrt[]{\frac{2}{fan\_in+fan\_out}}
std=gain×fan_in+fan_out2 其中fan_in和fan_out分别是输入神经元和输出神经元的数量,在全连接层中,就等于输入输出的feature数。
Kaiming均匀分布的初始化采用U(-bound, bound),其中bound的计算公式为:(a 的概念下面再说)
b
o
u
n
d
=
6
(
1
+
a
2
)
×
f
a
n
_
i
n
bound = \sqrt[]{\frac{6}{(1 + a ^2) \times fan\_in}}
bound=(1+a2)×fan_in6 这里补充一点,pytorch中这个公式也通过gain作为中间变量实现,也就是:
b
o
u
n
d
=
g
a
i
n
×
3
f
a
n
_
i
n
bound = gain \times \sqrt[]{\frac{3}{ fan\_in}}
bound=gain×fan_in3 其中:
g
a
i
n
=
2
1
+
a
2
gain = \sqrt{\frac{2}{1 + a^2}}
gain=1+a22 Kaiming正态分布的初始化采用N(0,std),其中std的计算公式为:
s
t
d
=
2
(
1
+
a
2
)
×
f
a
n
_
i
n
std = \sqrt[]{\frac{2}{(1 + a ^2) \times fan\_in}}
std=(1+a2)×fan_in2 这里稍微解释一下a的含义,源码中的解释为
the negative slope of the rectifier used after this layer
fan = _calculate_correct_fan(tensor, mode)
gain = calculate_gain(nonlinearity, a)
std = gain / math.sqrt(fan)
bound = math.sqrt(3.0)* std # Calculate uniform bounds from standard deviationwith torch.no_grad():return tensor.uniform_(-bound, bound)