无监督预训练 & 有监督预训练

2023-05-16

本文参考了几篇其他博客，具体链接在对应部分有提供

无监督预训练（unsupervised pre-training）

无监督预训练策略，主要应用于“复杂任务+少量标记数据集”，即没有足够的训练集为我们提供模型训练支持。这是 Hinton 团队在2006年提出的技术：A Fast Learning Algorithm for Deep Belief Nets。
这里有一段对其原理的描述

In SGD optimization, one typically initiates model weights at random and tries to go towards minimum cost by following the opposite of gradient of objective function. For deep nets, this has not shown much of success and it is believed to be result of extremely non-convex (and high-dimensional) nature of their objective function.
What Y. Bengio and others（原文链接） found out was that, instead of starting weights at random and hoping that SGD will take you to minimum point of such a rugged landscape, you can pre-train each layer like an autoencoder. Here is how it works: you build an autoencoder with first layer as encoding layer and the transpose of that as decoder. And you train it unsupervised, that is you train it to reconstruct the input (refer to Autoencoder, they are great for unsupervised feature extraction tasks). Once trained, you fix weights of that layer to those you just found. Then, you move to next layers and repeat the same until you pre-train all layers of deep net (greedy approach). At this point, you go back to the original problem that you wanted to solve with deep net (classification/regression) and you optimize it with SGD but starting from weights you just learned during pre-training.
They found that this gives much better results. I think no one knows why exactly this works, but the idea is that by pre-training you start from more favorable regions of feature space.
原回答链接

说白了，基本思想就是利用自动编码器（好像以前也用过限制玻尔兹曼机），逐层训练每一层，除了训练层外的其它层都被冻结。当所有层都经过这个预训练后，再用监督学习进行微调即可。
（下图链接）
图片来自：https://blog.csdn.net/ningyanggege/article/details/80596728
另外，关于自动编码器，上面的英文回答中给出了原文地址，这里再稍微谈一下它的基本思想：

将神经网络的隐含层看成是一个编码器和解码器，输入数据经过隐含层的编码和解码，力图将输出与自己的输入保持一致。这样，编码器就可以学习到输入数据的一种表征方式，而有望降低数据的维度，抓住主要特征

有监督预训练（supervised pre-training）

相比之下，有监督预训练就好理解多了，他可以理解为一种迁移学习，当我已经在一类问题训练好了一组模型参数的时候，若我想将该模型应用到类似但不同的其它问题上，不必从头开始训练网络，而是将上述模型参数作为网络初始值，在此基础上继续训练。在ZFNet（2013）原文中，作者是这样介绍的：

Using these tools, we start with the architecture of (Krizhevsky et al., 2012) and explore different architectures, discovering ones that outperform their resultson ImageNet. We then explore the generalization ability of the model to other datasets, just retraining the softmax classifier on top. As such, this is a form of supervised pre-training, which contrasts with the unsupervised pre-training methods popularized by (Hinton et al., 2006) and others (Bengio et al., 2007; Vincent et al., 2008). The generalization ability of convnet features is also explored in concurrent work by (Donahue et al., 2013).

这样，或许在优化模型收敛速度、过拟合问题、新问题小数据量等场景下具有较大优势。

欢迎交流和指正！

本文内容由网友自发贡献，版权归原作者所有，本站不承担相应法律责任。如您发现有涉嫌抄袭侵权的内容，请联系:hwhale#tublm.com(使用前将#替换为@)