caffe layer层详解





Vision Layers

可视化层来自于头文件 Header: ./include/caffe/vision_layers.hpp 一般输入和输出都是图像,这一层关注图像的2维的几何结构,并根据此结构对输入进行处理,特别是,大多数可视化层都通过对一些区域的操作,产生相关的区域进行输出,相反的是其他层忽视结合结构,只是把输入当作一个一维的大规模的向量进行处理。

Layer type: Convolution
CPU implementation: ./src/caffe/layers/convolution_layer.cpp

CUDA GPU implementation: ./src/caffe/layers/
Parameters (ConvolutionParameter convolution_param)

num_output (c_o): the number of filters
kernel_size (or kernel_h and kernel_w): specifies height and width of each filter
Strongly Recommended
weight_filler [default type: 'constant' value: 0]

bias_term [default true]: specifies whether to learn and apply a set of additive biases to the filter outputs
pad (or pad_h and pad_w) [default 0]: specifies the number of pixels to (implicitly) add to each side of the input
stride (or stride_h and stride_w) [default 1]: specifies the intervals at which to apply the filters to the input
group (g) [default 1]: If g > 1, we restrict the connectivity of each filter to a subset of the input. Specifically, the input and output channels are separated into g groups, and the ith output group channels will be only connected to the ith input group channels.

输入的大小: nci(channel)hi(height)wi(weight)
ncohowo,whereho=(hi+2padhkernelh)/strideh+1andwo likewise.

nchowo , where h_o and w_o are computed in the same way as convolution.

Local Response Normalization (LRN):
Layer type: LRN
CPU Implementation: ./src/caffe/layers/lrn_layer.cpp
CUDA GPU Implementation: ./src/caffe/layers/
Parameters (LRNParameter lrn_param)
local_size [default 5]: the number of channels to sum over (for cross channel LRN) or the side length of the square region to sum over (for within channel LRN)
alpha [default 1]: the scaling parameter (see below)
beta [default 5]: the exponent (see below)
norm_region [default ACROSS_CHANNELS]: whether to sum over adjacent channels (ACROSS_CHANNELS) or nearby spatial locaitons (WITHIN_CHANNEL)
The local response normalization layer performs a kind of “lateral inhibition” by normalizing over local input regions. In ACROSS_CHANNELS mode, the local regions extend across nearby channels, but have no spatial extent (i.e., they have shape local_size x 1 x 1). In WITHIN_CHANNEL mode, the local regions extend spatially, but are in separate channels (i.e., they have shape 1 x local_size x local_size). Each input value is divided by (1+(α/n)ix2i)β , where n is the size of each local region, and the sum is taken over the region centered at that value (zero padding is added where necessary).


Loss Layers

本层计算输入的多元的Logistic 损失l(θ)=log(oy)其中 oy 是分类是y的概率.
注意与softmax-loss的区别softmax-loss其实就是把 oy 展开

.其中 zy zi=ωTix+bi 是第i个类别的线性预测结果。

类型: EuclideanLoss



# L1 Norm
layer {
  name: "loss"
  type: "HingeLoss"
  bottom: "pred"
  bottom: "label"

# L2 Norm
layer {
  name: "loss"
  type: "HingeLoss"
  bottom: "pred"
  bottom: "label"
  top: "loss"
  hinge_loss_param {
    norm: L2

hinge loss层计算了一个一对多的,或者是平方的损失函数
Sigmoid Cross-Entropy:

 31 template <typename Dtype>
 32 void SigmoidCrossEntropyLossLayer<Dtype>::Forward_cpu(
 33     const vector<Blob<Dtype>*>& bottom, const vector<Blob<Dtype>*>& top) {
 34   // The forward pass computes the sigmoid outputs.
 35   sigmoid_bottom_vec_[0] = bottom[0];
 36   sigmoid_layer_->Forward(sigmoid_bottom_vec_, sigmoid_top_vec_);
 37   // Compute the loss (negative log likelihood)
 38   const int count = bottom[0]->count();
 39   const int num = bottom[0]->num();
 40   // Stable version of loss computation from input data
 41   const Dtype* input_data = bottom[0]->cpu_data();
 42   const Dtype* target = bottom[1]->cpu_data();
 43   Dtype loss = 0;
 44   for (int i = 0; i < count; ++i) {
 45     loss -= input_data[i] * (target[i] - (input_data[i] >= 0)) -
 46         log(1 + exp(input_data[i] - 2 * input_data[i] * (input_data[i] >= 0)));
 47   }
 48   top[0]->mutable_cpu_data()[0] = loss / num;
 49 }


 49 template <typename Dtype>
 50 void InfogainLossLayer<Dtype>::Forward_cpu(const vector<Blob<Dtype>*>& bottom,
 51     const vector<Blob<Dtype>*>& top) {
 52   const Dtype* bottom_data = bottom[0]->cpu_data();
 53   const Dtype* bottom_label = bottom[1]->cpu_data();
 54   const Dtype* infogain_mat = NULL;
 55   if (bottom.size() < 3) {
 56     infogain_mat = infogain_.cpu_data();
 57   } else {
 58     infogain_mat = bottom[2]->cpu_data();
 59   }
 60   int num = bottom[0]->num();
 61   int dim = bottom[0]->count() / bottom[0]->num();
 62   Dtype loss = 0;
 63   for (int i = 0; i < num; ++i) {
 64     int label = static_cast<int>(bottom_label[i]);
 65     for (int j = 0; j < dim; ++j) {
 66       Dtype prob = std::max(bottom_data[i * dim + j], Dtype(kLOG_THRESHOLD));
 67       loss -= infogain_mat[label * dim + j] * log(prob);
 68     }
 69   }
 70   top[0]->mutable_cpu_data()[0] = loss / num;
 71 }
 73 template <typename Dtype>
 74 void InfogainLossLayer<Dtype>::Backward_cpu(const vector<Blob<Dtype>*>& top,
 75     const vector<bool>& propagate_down,
 76     const vector<Blob<Dtype>*>& bottom) {
 77   if (propagate_down[1]) {
 78     LOG(FATAL) << this->type()
 79                << " Layer cannot backpropagate to label inputs.";
 80   }
 81   if (propagate_down.size() > 2 && propagate_down[2]) {
 82     LOG(FATAL) << this->type()
 83                << " Layer cannot backpropagate to infogain inputs.";
 84   }
 85   if (propagate_down[0]) {
 86     const Dtype* bottom_data = bottom[0]->cpu_data();
 87     const Dtype* bottom_label = bottom[1]->cpu_data();
 88     const Dtype* infogain_mat = NULL;
 89     if (bottom.size() < 3) {
 90       infogain_mat = infogain_.cpu_data();
 91     } else {
 92       infogain_mat = bottom[2]->cpu_data();
 93     }
 94     Dtype* bottom_diff = bottom[0]->mutable_cpu_diff();
 95     int num = bottom[0]->num();
 96     int dim = bottom[0]->count() / bottom[0]->num();
 97     const Dtype scale = - top[0]->cpu_diff()[0] / num;
 98     for (int i = 0; i < num; ++i) {
 99       const int label = static_cast<int>(bottom_label[i]);
100       for (int j = 0; j < dim; ++j) {
101         Dtype prob = std::max(bottom_data[i * dim + j], Dtype(kLOG_THRESHOLD));
102         bottom_diff[i * dim + j] = scale * infogain_mat[label * dim + j] / prob;
103       }
104     }
105   }
106 }
108 INSTANTIATE_CLASS(InfogainLossLayer);
110 }  // namespace caffe

Accuracy and Top-k:


Activation / Neuron Layers

Input: nchw
Output: nchw

ReLU/Rectified inner and leaky-ReLU:
Parameters (ReLUParameter relu_param)
negative_slope [default 0]: specifies whether to leak the negative part by multiplying it with the slope value rather than setting it to 0.

layer {
  name: "relu1"
  type: "ReLU"
  bottom: "conv1"
  top: "conv1"


f(x)={xnegative_slopexif x>0,otherwise.

其中 negative_slope 不是设定的,与 max(0,x) 相等,详情见我的另外一个小博客


layer {
  name: "encode1neuron"
  bottom: "encode1"
  top: "encode1neuron"
  type: "Sigmoid"



TanH / Hyperbolic Tangent:

layer {
  name: "layer"
  bottom: "in"
  top: "out"
  type: "TanH"



layer {
  name: "layer"
  bottom: "in"
  top: "out"
  type: "AbsVal"



power [default 1]
scale [default 1]
shift [default 0]

layer {
  name: "layer"
  bottom: "in"
  top: "out"
  type: "Power"
  power_param {
    power: 1
    scale: 1
    shift: 0




layer {
  name: "layer"
  bottom: "in"
  top: "out"
  type: BNLL

The BNLL (binomial normal log likelihood) layer computes the output as


Data Layers

Common Layers

num_output (c_o): the number of filters
强烈建议的:weight_filler [default type: ‘constant’ value: 0]
bias_filler [default type: ‘constant’ value: 0]
bias_term [default true]: specifies whether to learn and apply a set of additive biases to the filter outputs

layer {
  name: "fc8"
  type: "InnerProduct"
  # learning rate and decay multipliers for the weights
  param { lr_mult: 1 decay_mult: 1 }
  # learning rate and decay multipliers for the biases
  param { lr_mult: 2 decay_mult: 0 }
  inner_product_param {
    num_output: 1000
    weight_filler {
      type: "gaussian"
      std: 0.01
    bias_filler {
      type: "constant"
      value: 0
  bottom: "fc7"
  top: "fc8"

内积层又叫全连接层,输入当做一个以为想想,产生的输出也是以向量的形式输出,相当于blob的height 和width是1.

slice layer
Compute the index of the K max values for each datum across all dimensions (C×H×W) .

Intended for use after a classification layer to produce a prediction. If parameter out_max_val is set to true, output is a vector of pairs (max_ind, max_val) for each image. The axis parameter specifies an axis along which to maximise.
NOTE: does not implement Backwards operation.
Compute elementwise operations, such as product and sum, along multiple input Blobs.

2、Alex 网络定义


Add a class declaration for your layer to the appropriate one of common_layers.hpp,data_layers.hpp, loss_layers.hpp, neuron_layers.hpp, or vision_layers.hpp. Include an inline implementation of type and the *Blobs() methods to specify blob number requirements. Omit the*_gpu declarations if you’ll only be implementing CPU code.

Implement your layer in layers/your_layer.cpp.

SetUp for initialization: reading parameters, allocating buffers, etc.

Forward_cpu for the function your layer computes

Backward_cpu for its gradient

(Optional) Implement the GPU versions Forward_gpu and Backward_gpu in layers/

Add your layer to proto/caffe.proto, updating the next available ID. Also declare parameters, if needed, in this file.

Make your layer createable by adding it to layer_factory.cpp.

Write tests in test/test_your_layer.cpp. Use test/test_gradient_check_util.hpp to check that your Forward and Backward implementations are in numerical agreement.

以上是github上某大神的解答,步骤很清晰,具体说一下,比如现在要添加一个vision layer,名字叫Aaa_Layer:








  • caffe layer层详解

    1 基本的layer定义 xff0c 参数 1 基本的layer定义 xff0c 参数 如何利用caffe定义一个网络 xff0c 首先要了解caffe中的基本接口 xff0c 下面分别对五类layer进行介绍 Vision Layers