我需要为 Keras VGG16 预训练权重吗？

2024-01-03

作为背景，我对机器学习领域相对较新，我正在尝试一个项目，目标是对 NBA 比赛中的比赛进行分类。我的输入是 NBA 比赛中每次比赛的 40 帧序列，我的标签是给定比赛的 11 个包罗万象的分类。

该计划是获取每个帧序列并将每个帧传递到 CNN 以提取一组特征。然后，给定视频中的每个特征序列将被传递到 RNN 上。

目前，我的大部分实现都使用 Keras，并且我选择对 CNN 使用 VGG16 模型。下面是一些相关代码：

video = keras.Input(shape = (None, 255, 255, 3), name = 'video')
cnn = keras.applications.VGG16(include_top=False, weights = None, input_shape=
(255,255,3), pooling = 'avg', classes=11)
cnn.trainable = True

我的问题是 - 如果我的目标是对 NBA 比赛的视频剪辑进行分类，将 VGG16 ConvNet 的权重初始化为“imagenet”对我是否仍然有益？如果是这样，为什么？如果没有，我如何训练 VGG16 ConvNet 以获得我自己的权重集，然后如何将它们插入到这个函数中？我没有找到任何有人在使用 VGG16 模型时包含自己的权重集的教程。

如果我的问题看起来很天真，我深表歉意，但我真的很感激任何帮助解决这个问题的帮助。

您是否应该针对您的特定任务重新训练 VGG16？绝对不！重新训练如此庞大的网络非常困难，并且在训练深度网络时需要大量的直觉和知识。让我们分析一下为什么您可以使用在 ImageNet 上预先训练的权重来完成您的任务：

ImageNet 是一个巨大的数据集，包含数百万张图像。 VGG16 本身已在强大的 GPU 上经过 3-4 天左右的训练。在 CPU 上（假设您没有像 NVIDIA GeForce Titan X 一样强大的 GPU）需要数周时间。
ImageNet 包含来自现实世界场景的图像。 NBA比赛也可以被视为现实世界的场景。因此，基于 ImageNet 特征的预训练很可能也可以用于 NBA 比赛。

实际上，您不需要使用预训练的 VGG16 的所有卷积层。让我们看一下内部 VGG16 层的可视化 https://blog.keras.io/img/vgg16_filters_overview.jpg看看他们检测到了什么（取自本文 https://blog.keras.io/how-convolutional-neural-networks-see-the-world.html;图片太大，为了紧凑，我只放了一个链接）：

第一个和第二个卷积块着眼于低级特征，例如角点、边缘等。
第三和第四个卷积块着眼于表面特征、曲线、圆等。
第五层着眼于高层特征

因此，您可以决定哪种功能对您的特定任务有益。您需要第五块的高级功能吗？或者您可能想使用第三块的中级功能？也许您想在 VGG 底层之上堆叠另一个神经网络？有关更多说明，请查看我编写的以下教程；它曾经出现在 SO 文档上。

使用 VGG 和 Keras 进行迁移学习和微调

在这个例子中，提出了三个简短而全面的子例子：

从可用的预训练模型加载权重，包括Keras library
在 VGG 的任意层之上堆叠另一个网络进行训练
在其他图层中间插入一个图层
使用 VGG 进行微调和迁移学习的技巧和一般经验法则

加载预先训练的权重

预训练于ImageNet型号，包括VGG-16 and VGG-19，可用于Keras。在这个例子中，这里和之后，VGG-16将会被使用。欲了解更多信息，请访问Keras 应用程序文档 https://keras.io/applications/.

from keras import applications

# This will load the whole VGG16 network, including the top Dense layers.
# Note: by specifying the shape of top layers, input tensor shape is forced
# to be (224, 224, 3), therefore you can use it only on 224x224 images.
vgg_model = applications.VGG16(weights='imagenet', include_top=True)

# If you are only interested in convolution filters. Note that by not
# specifying the shape of top layers, the input tensor shape is (None, None, 3),
# so you can use them for any size of images.
vgg_model = applications.VGG16(weights='imagenet', include_top=False)

# If you want to specify input tensor
from keras.layers import Input
input_tensor = Input(shape=(160, 160, 3))
vgg_model = applications.VGG16(weights='imagenet',
                               include_top=False,
                               input_tensor=input_tensor)

# To see the models' architecture and layer names, run the following
vgg_model.summary()

使用来自 VGG 的底层创建一个新网络

假设对于某些特定任务，图像尺寸为(160, 160, 3)，您想要使用 VGG 的预训练底层，直到具有名称的层block2_pool.

vgg_model = applications.VGG16(weights='imagenet',
                               include_top=False,
                               input_shape=(160, 160, 3))

# Creating dictionary that maps layer names to the layers
layer_dict = dict([(layer.name, layer) for layer in vgg_model.layers])

# Getting output tensor of the last VGG layer that we want to include
x = layer_dict['block2_pool'].output

# Stacking a new simple convolutional network on top of it    
x = Conv2D(filters=64, kernel_size=(3, 3), activation='relu')(x)
x = MaxPooling2D(pool_size=(2, 2))(x)
x = Flatten()(x)
x = Dense(256, activation='relu')(x)
x = Dropout(0.5)(x)
x = Dense(10, activation='softmax')(x)

# Creating new model. Please note that this is NOT a Sequential() model.
from keras.models import Model
custom_model = Model(input=vgg_model.input, output=x)

# Make sure that the pre-trained bottom layers are not trainable
for layer in custom_model.layers[:7]:
    layer.trainable = False

# Do not forget to compile it
custom_model.compile(loss='categorical_crossentropy',
                     optimizer='rmsprop',
                     metrics=['accuracy'])

删除多层并在中间插入新一层

假设您需要通过替换来加速 VGG16block1_conv1 and block2_conv2使用单个卷积层，以保存预训练权重的方式。这个想法是将整个网络分解为不同的层，然后将其组装回来。这是专门针对您的任务的代码：

vgg_model = applications.VGG16(include_top=True, weights='imagenet')

# Disassemble layers
layers = [l for l in vgg_model.layers]

# Defining new convolutional layer.
# Important: the number of filters should be the same!
# Note: the receiptive field of two 3x3 convolutions is 5x5.
new_conv = Conv2D(filters=64, 
                  kernel_size=(5, 5),
                  name='new_conv',
                  padding='same')(layers[0].output)

# Now stack everything back
# Note: If you are going to fine tune the model, do not forget to
#       mark other layers as un-trainable
x = new_conv
for i in range(3, len(layers)):
    layers[i].trainable = False
    x = layers[i](x)

# Final touch
result_model = Model(input=layer[0].input, output=x)

本文内容由网友自发贡献，版权归原作者所有，本站不承担相应法律责任。如您发现有涉嫌抄袭侵权的内容，请联系:hwhale#tublm.com(使用前将#替换为@)