卷积层的前向传播和反向传播
说明
本文中,只实现一层卷积层的正反向传播(带激活函数Relu),实现的是多通道输入,多通道输出的。之前对深度学习的理解大多止于pytorch里面现成的API,这还是第一次摆脱import torch,若有错误还烦请帮忙指正~
为了更加方便地表示张量,我们这里调用numpy包。
前向传播
原理
a
l
=
σ
(
z
l
)
=
σ
(
a
l
−
1
∗
W
l
+
b
l
)
a^{l}=\sigma\left(z^{l}\right)=\sigma\left(a^{l-1} * W^{l}+b^{l}\right)
al=σ(zl)=σ(al−1∗Wl+bl)
其中,
σ
\sigma
σ是激活函数,在本文中使用Relu;
a
l
−
1
a^{l-1}
al−1是该卷积层的输入,
a
l
a^{l}
al是经过激活函数后的输出,
z
l
z^{l}
zl是未经过激活函数的输出,
W
l
W^{l}
Wl是卷积核,
b
l
b^{l}
bl是该层的偏置。
代码
激活函数Relu的前向传播,在输入大于等于零时保持原值,小于零时输出置零。
def Rulu(z):
z[z<0]=0
return z
卷积层的前向传播的代码如下所示(带激活函数Relu)。
X
,
W
,
b
X,W,b
X,W,b分别与
a
l
−
1
,
W
l
,
b
l
a^{l-1}, W^{l}, b^{l}
al−1,Wl,bl 相对应;输出
O
u
t
Out
Out即对应
a
l
a^{l}
al。
def conv_forward(X, W, b, stride=(1,1), padding=(0,0)):
m,c,Ih,Iw=X.shape
f,_,Kw,Kh=W.shape
Sw,Sh=stride
Pw,Ph=padding
Oh = int( 1 + (Ih + 2 * Ph - Kh) / Sh )
Ow = int( 1 + (Iw + 2 * Pw - Kw) / Sw )
Out=np.zeros([m, f, Oh, Ow])
X_pad = np.zeros((m, c, Ih +2 * Ph, Iw +2 * Pw))
X_pad[:,:,Ph:Ph+Ih,Pw:Pw+Iw]= X
for n in range(Out.shape[1]):
for i in range(Out.shape[2]):
for j in range(Out.shape[3]):
Out[:,n,i,j] = np.sum(X_pad[:, :, i*Sh : i*Sh+Kh, j*Sw : j*Sw+Kw] * W[n, :, :, :], axis=(1, 2, 3))
Out[:,n,:,:]+=b[n]
relu(Out)
return Out
反向传播
原理
我们要通过反向传播求出损失函数
J
(
W
,
b
)
J(W,b)
J(W,b)对
z
l
,
W
l
,
b
l
z^{l}, W^{l}, b^{l}
zl,Wl,bl三者的偏导数。其中记
δ
l
=
∂
J
(
W
,
b
)
∂
z
l
\delta^{l}=\frac{\partial J(W, b)}{\partial z^{l}}
δl=∂zl∂J(W,b)
δ
l
−
1
=
δ
l
∗
rot
180
(
W
l
)
⊙
σ
′
(
z
l
−
1
)
\delta^{l-1}=\delta^{l} * \operatorname{rot} 180\left(W^{l}\right) \odot \sigma^{\prime}\left(z^{l-1}\right)
δl−1=δl∗rot180(Wl)⊙σ′(zl−1)
其中,
⊙
\odot
⊙ 代表Hadamard积,对于两个维度相同的向量:
A
=
(
a
1
,
a
2
,
.
.
.
,
a
n
)
T
A=(a_{1},a_{2},...,a_{n})^{T}
A=(a1,a2,...,an)T,
B
=
(
b
1
,
b
2
,
.
.
.
,
b
n
)
T
B=(b_{1},b_{2},...,b_{n})^{T}
B=(b1,b2,...,bn)T,则
A
⊙
B
=
(
a
1
b
1
,
a
2
b
2
,
.
.
.
,
a
n
b
n
)
T
A⊙B=(a_{1}b_{1},a_{2}b_{2},...,a_{n}b_{n})^{T}
A⊙B=(a1b1,a2b2,...,anbn)T。而
σ
′
(
z
l
−
1
)
\sigma^{\prime}(z^{l-1})
σ′(zl−1)为Relu函数的导数,即当函数的自变量小于零时,导数为零,否则为1。至于为何要旋转180度,链接1的文章中有解释。
∂
J
(
W
,
b
)
∂
W
l
=
a
l
−
1
∗
δ
l
\frac{\partial J(W, b)}{\partial W^{l}}=a^{l-1} * \delta^{l}
∂Wl∂J(W,b)=al−1∗δl
∂
J
(
W
,
b
)
∂
b
l
=
∑
u
,
v
(
δ
l
)
u
,
v
\frac{\partial J(W, b)}{\partial b^{l}}=\sum_{u, v}\left(\delta^{l}\right)_{u, v}
∂bl∂J(W,b)=u,v∑(δl)u,v
代码
损失函数
J
(
W
,
b
)
J(W,b)
J(W,b)对
z
l
,
a
l
−
1
,
W
l
,
b
l
,
z
l
−
1
z^{l}, a^{l-1},W^{l}, b^{l},z^{l-1}
zl,al−1,Wl,bl,zl−1三者的偏导数分别记为dz,dx,dw,db,dz0
def conv_backward(dz, X, W, b, stride=(1,1), padding=(0,0)):
"""
dz: Gradient with respect to z
dz0: Gradient with respect to z of the former convolutional layer
dx: Gradient with respect to x
dw: Gradient with respect to w
db: Gradient with respect to b
"""
m, f, _, _ = dz.shape
m, c, Ih, Iw = X.shape
_,_,Kh,Kw = W.shape
Sw,Sh=stride
Pw,Ph=padding
dx, dw, db = np.zeros_like(X), np.zeros_like(W), np.zeros_like(b)
X_pad = np.pad(X, [(0,0), (0,0), (Ph,Ph), (Pw,Pw)], 'constant')
dx_pad = np.pad(dx, [(0,0), (0,0), (Ph,Ph), (Pw,Pw)], 'constant')
db = np.sum(dz, axis=(0,2,3))
for k in range(dz.shape[0]):
for i in range(dz.shape[2]):
for j in range(dz.shape[3]):
X_w = X_pad[k, :, i * Sh : i * Sh + Kh, j * Sw : j * Sw + Kw]
for n in range(f):
dw[n] += X_w* dz[k, n, i, j]
dx_pad[k, :, i * Sh : i * Sh + Kh, j * Sw : j * Sw + Kw] += np.flip(W[n],axis=(1,2)) * dz[k, n, i, j]
dx = dx_pad[:, :, Ph:Ph+Ih, Pw:Pw+Iw]
dz0 = dx
dz0[dz0<0]=0
return dx, dw, db, dz0
完整代码
import numpy as np
def relu(z):
z[z<0]=0
return z;
def conv_forward(X, W, b, stride=(1,1), padding=(0,0)):
m,c,Ih,Iw=X.shape
f,_,Kw,Kh=W.shape
Sw,Sh=stride
Pw,Ph=padding
Oh = int( 1 + (Ih + 2 * Ph - Kh) / Sh )
Ow = int( 1 + (Iw + 2 * Pw - Kw) / Sw )
Out=np.zeros([m, f, Oh, Ow])
X_pad = np.zeros((m, c, Ih +2 * Ph, Iw +2 * Pw))
X_pad[:,:,Ph:Ph+Ih,Pw:Pw+Iw]= X
for n in range(Out.shape[1]):
for i in range(Out.shape[2]):
for j in range(Out.shape[3]):
Out[:,n,i,j] = np.sum(X_pad[:, :, i*Sh : i*Sh+Kh, j*Sw : j*Sw+Kw] * W[n, :, :, :], axis=(1, 2, 3))
Out[:,n,:,:]+=b[n]
relu(Out)
return Out
def conv_backward(dz, X, W, b, stride=(1,1), padding=(0,0)):
"""
dz: Gradient with respect to z
dz0: Gradient with respect to z of the former convolutional layer
dx: Gradient with respect to x
dw: Gradient with respect to w
db: Gradient with respect to b
"""
m, f, _, _ = dz.shape
m, c, Ih, Iw = X.shape
_,_,Kh,Kw = W.shape
Sw,Sh=stride
Pw,Ph=padding
dx, dw, db = np.zeros_like(X), np.zeros_like(W), np.zeros_like(b)
X_pad = np.pad(X, [(0,0), (0,0), (Ph,Ph), (Pw,Pw)], 'constant')
dx_pad = np.pad(dx, [(0,0), (0,0), (Ph,Ph), (Pw,Pw)], 'constant')
db = np.sum(dz, axis=(0,2,3))
for k in range(dz.shape[0]):
for i in range(dz.shape[2]):
for j in range(dz.shape[3]):
x_window = X_pad[k, :, i * Sh : i * Sh + Kh, j * Sw : j * Sw + Kw]
for n in range(f):
dw[n] += x_window * dz[k, n, i, j]
dx_pad[k, :, i * Sh : i * Sh + Kh, j * Sw : j * Sw + Kw] += np.flip(W[n],axis=(1,2)) * dz[k, n, i, j]
dx = dx_pad[:, :, Ph:Ph+Ih, Pw:Pw+Iw]
dz0 = dx
dz0[dz0<0]=0
return dx, dw, db, dz0
参考链接
- https://www.cnblogs.com/pinard/p/6494810.html
- https://blog.csdn.net/qq_38585359/article/details/102658211?spm=1001.2101.3001.6650.1&utm_medium=distribute.pc_relevant.none-task-blog-2%7Edefault%7ECTRLIST%7ETopBlog-1.topblog&depth_1-utm_source=distribute.pc_relevant.none-task-blog-2%7Edefault%7ECTRLIST%7ETopBlog-1.topblog&utm_relevant_index=2
本文内容由网友自发贡献,版权归原作者所有,本站不承担相应法律责任。如您发现有涉嫌抄袭侵权的内容,请联系:hwhale#tublm.com(使用前将#替换为@)