Learning rate
最优值从1e-4到1e-1的数量级都碰到过,原则大概是越简单的模型的learning rate可以越大一些。
[https://blog.csdn.net/weixin_44070747/article/details/94339089]
其它:
增大batchsize来保持学习率的策略
[抛弃Learning Rate Decay吧 https://www.sohu.com/a/218600766_114877]
learning rate adaptation
bold driver algorithm
\textcolor{orange}{\text{bold driver algorithm}}
bold driver algorithm: after each epoch, compare the network’s loss L(t) to its previous value, L(t-1). If the error has decreased, increase
η
\eta
η by a small proportion - typically 1%-5%. If the error has increased by more than a tiny proportion (say,
1
0
−
10
10^{-10}
10−10), however, undo the last weight change, and decrease
η
\eta
η sharply - typically by 50%
[Momentum and Learning Rate Adaptation https://cnl.salk.edu/~schraudo/teach/NNcourse/momrate.html]
Regularization parameter
λ
\lambda
λ
建议一开始将正则项系数λ设置为0,先确定一个比较好的learning rate。然后固定该learning rate,给λ一个值(比如1.0),然后根据validation accuracy,将λ增大或者减小10倍(增减10倍是粗调节,当你确定了λ的合适的数量级后,比如λ = 0.01,再进一步地细调节,比如调节为0.02,0.03,0.009之类。)
[https://www.cnblogs.com/bonelee/p/8578481.html]
后续:
loss landscape:x,y,z分别代表什么
参考文献
- https://blog.csdn.net/daydayjump/article/details/88218097
本文内容由网友自发贡献,版权归原作者所有,本站不承担相应法律责任。如您发现有涉嫌抄袭侵权的内容,请联系:hwhale#tublm.com(使用前将#替换为@)