【Caffe】官方例程之R-CNN物体检测

2023-10-27

R-CNN is a state-of-the-art detector that classifies region proposals by a finetuned Caffe model. For the full details of the R-CNN system and model, refer to its project site and the paper:

Rich feature hierarchies for accurate object detection and semantic segmentation. Ross Girshick, Jeff Donahue, Trevor Darrell, Jitendra Malik. CVPR 2014. Arxiv 2013.

In this example, we do detection by a pure Caffe edition of the R-CNN model for ImageNet. The R-CNN detector outputs class scores for the 200 detection classes of ILSVRC13. Keep in mind that these are raw one vs. all SVM scores, so they are not probabilistically calibrated or exactly comparable across classes. Note that this off-the-shelf model is simply for convenience, and is not the full R-CNN model.

Let's run detection on an image of a bicyclist riding a fish bike in the desert (from the ImageNet challenge—no joke).

First, we'll need region proposals and the Caffe R-CNN ImageNet model:

  • Selective Search is the region proposer used by R-CNN. The selective_search_ijcv_with_python Python module takes care of extracting proposals through the selective search MATLAB implementation. To install it, download the module and name its directory selective_search_ijcv_with_python, run the demo in MATLAB to compile the necessary functions, then add it to your PYTHONPATH for importing. (If you have your own region proposals prepared, or would rather not bother with this step, detect.py accepts a list of images and bounding boxes as CSV.)

-Run ./scripts/download_model_binary.py models/bvlc_reference_rcnn_ilsvrc13 to get the Caffe R-CNN ImageNet model.

With that done, we'll call the bundled detect.py to generate the region proposals and run the network. For an explanation of the arguments, do ./detect.py --help.

In [1]:

!mkdir -p _temp
!echo `pwd`/images/fish-bike.jpg > _temp/det_input.txt
!../python/detect.py --crop_mode=selective_search --pretrained_model=../models/bvlc_reference_rcnn_ilsvrc13/bvlc_reference_rcnn_ilsvrc13.caffemodel --model_def=../models/bvlc_reference_rcnn_ilsvrc13/deploy.prototxt --gpu --raw_scale=255 _temp/det_input.txt _temp/det_output.h5
WARNING: Logging before InitGoogleLogging() is written to STDERR
I0218 20:43:25.383932 2099749632 net.cpp:42] Initializing net from parameters: 
name: "R-CNN-ilsvrc13"
input: "data"
input_dim: 10
input_dim: 3
input_dim: 227
input_dim: 227
state {
  phase: TEST
}
layer {
  name: "conv1"
  type: "Convolution"
  bottom: "data"
  top: "conv1"
  convolution_param {
    num_output: 96
    kernel_size: 11
    stride: 4
  }
}
layer {
  name: "relu1"
  type: "ReLU"
  bottom: "conv1"
  top: "conv1"
}
layer {
  name: "pool1"
  type: "Pooling"
  bottom: "conv1"
  top: "pool1"
  pooling_param {
    pool: MAX
    kernel_size: 3
    stride: 2
  }
}
layer {
  name: "norm1"
  type: "LRN"
  bottom: "pool1"
  top: "norm1"
  lrn_param {
    local_size: 5
    alpha: 0.0001
    beta: 0.75
  }
}
layer {
  name: "conv2"
  type: "Convolution"
  bottom: "norm1"
  top: "conv2"
  convolution_param {
    num_output: 256
    pad: 2
    kernel_size: 5
    group: 2
  }
}
layer {
  name: "relu2"
  type: "ReLU"
  bottom: "conv2"
  top: "conv2"
}
layer {
  name: "pool2"
  type: "Pooling"
  bottom: "conv2"
  top: "pool2"
  pooling_param {
    pool: MAX
    kernel_size: 3
    stride: 2
  }
}
layer {
  name: "norm2"
  type: "LRN"
  bottom: "pool2"
  top: "norm2"
  lrn_param {
    local_size: 5
    alpha: 0.0001
    beta: 0.75
  }
}
layer {
  name: "conv3"
  type: "Convolution"
  bottom: "norm2"
  top: "conv3"
  convolution_param {
    num_output: 384
    pad: 1
    kernel_size: 3
  }
}
layer {
  name: "relu3"
  type: "ReLU"
  bottom: "conv3"
  top: "conv3"
}
layer {
  name: "conv4"
  type: "Convolution"
  bottom: "conv3"
  top: "conv4"
  convolution_param {
    num_output: 384
    pad: 1
    kernel_size: 3
    group: 2
  }
}
layer {
  name: "relu4"
  type: "ReLU"
  bottom: "conv4"
  top: "conv4"
}
layer {
  name: "conv5"
  type: "Convolution"
  bottom: "conv4"
  top: "conv5"
  convolution_param {
    num_output: 256
    pad: 1
    kernel_size: 3
    group: 2
  }
}
layer {
  name: "relu5"
  type: "ReLU"
  bottom: "conv5"
  top: "conv5"
}
layer {
  name: "pool5"
  type: "Pooling"
  bottom: "conv5"
  top: "pool5"
  pooling_param {
    pool: MAX
    kernel_size: 3
    stride: 2
  }
}
layer {
  name: "fc6"
  type: "InnerProduct"
  bottom: "pool5"
  top: "fc6"
  inner_product_param {
    num_output: 4096
  }
}
layer {
  name: "relu6"
  type: "ReLU"
  bottom: "fc6"
  top: "fc6"
}
layer {
  name: "drop6"
  type: "Dropout"
  bottom: "fc6"
  top: "fc6"
  dropout_param {
    dropout_ratio: 0.5
  }
}
layer {
  name: "fc7"
  type: "InnerProduct"
  bottom: "fc6"
  top: "fc7"
  inner_product_param {
    num_output: 4096
  }
}
layer {
  name: "relu7"
  type: "ReLU"
  bottom: "fc7"
  top: "fc7"
}
layer {
  name: "drop7"
  type: "Dropout"
  bottom: "fc7"
  top: "fc7"
  dropout_param {
    dropout_ratio: 0.5
  }
}
layer {
  name: "fc-rcnn"
  type: "InnerProduct"
  bottom: "fc7"
  top: "fc-rcnn"
  inner_product_param {
    num_output: 200
  }
}
I0218 20:43:25.385720 2099749632 net.cpp:336] Input 0 -> data
I0218 20:43:25.385769 2099749632 layer_factory.hpp:74] Creating layer conv1
I0218 20:43:25.385783 2099749632 net.cpp:76] Creating Layer conv1
I0218 20:43:25.385790 2099749632 net.cpp:372] conv1 <- data
I0218 20:43:25.385802 2099749632 net.cpp:334] conv1 -> conv1
I0218 20:43:25.385815 2099749632 net.cpp:105] Setting up conv1
I0218 20:43:25.386574 2099749632 net.cpp:112] Top shape: 10 96 55 55 (2904000)
I0218 20:43:25.386610 2099749632 layer_factory.hpp:74] Creating layer relu1
I0218 20:43:25.386625 2099749632 net.cpp:76] Creating Layer relu1
I0218 20:43:25.386631 2099749632 net.cpp:372] relu1 <- conv1
I0218 20:43:25.386641 2099749632 net.cpp:323] relu1 -> conv1 (in-place)
I0218 20:43:25.386649 2099749632 net.cpp:105] Setting up relu1
I0218 20:43:25.386656 2099749632 net.cpp:112] Top shape: 10 96 55 55 (2904000)
I0218 20:43:25.386663 2099749632 layer_factory.hpp:74] Creating layer pool1
I0218 20:43:25.386675 2099749632 net.cpp:76] Creating Layer pool1
I0218 20:43:25.386682 2099749632 net.cpp:372] pool1 <- conv1
I0218 20:43:25.386690 2099749632 net.cpp:334] pool1 -> pool1
I0218 20:43:25.386699 2099749632 net.cpp:105] Setting up pool1
I0218 20:43:25.386716 2099749632 net.cpp:112] Top shape: 10 96 27 27 (699840)
I0218 20:43:25.386725 2099749632 layer_factory.hpp:74] Creating layer norm1
I0218 20:43:25.386736 2099749632 net.cpp:76] Creating Layer norm1
I0218 20:43:25.386744 2099749632 net.cpp:372] norm1 <- pool1
I0218 20:43:25.386803 2099749632 net.cpp:334] norm1 -> norm1
I0218 20:43:25.386819 2099749632 net.cpp:105] Setting up norm1
I0218 20:43:25.386832 2099749632 net.cpp:112] Top shape: 10 96 27 27 (699840)
I0218 20:43:25.386842 2099749632 layer_factory.hpp:74] Creating layer conv2
I0218 20:43:25.386852 2099749632 net.cpp:76] Creating Layer conv2
I0218 20:43:25.386865 2099749632 net.cpp:372] conv2 <- norm1
I0218 20:43:25.386878 2099749632 net.cpp:334] conv2 -> conv2
I0218 20:43:25.386899 2099749632 net.cpp:105] Setting up conv2
I0218 20:43:25.387024 2099749632 net.cpp:112] Top shape: 10 256 27 27 (1866240)
I0218 20:43:25.387042 2099749632 layer_factory.hpp:74] Creating layer relu2
I0218 20:43:25.387050 2099749632 net.cpp:76] Creating Layer relu2
I0218 20:43:25.387058 2099749632 net.cpp:372] relu2 <- conv2
I0218 20:43:25.387066 2099749632 net.cpp:323] relu2 -> conv2 (in-place)
I0218 20:43:25.387075 2099749632 net.cpp:105] Setting up relu2
I0218 20:43:25.387081 2099749632 net.cpp:112] Top shape: 10 256 27 27 (1866240)
I0218 20:43:25.387089 2099749632 layer_factory.hpp:74] Creating layer pool2
I0218 20:43:25.387097 2099749632 net.cpp:76] Creating Layer pool2
I0218 20:43:25.387104 2099749632 net.cpp:372] pool2 <- conv2
I0218 20:43:25.387112 2099749632 net.cpp:334] pool2 -> pool2
I0218 20:43:25.387121 2099749632 net.cpp:105] Setting up pool2
I0218 20:43:25.387130 2099749632 net.cpp:112] Top shape: 10 256 13 13 (432640)
I0218 20:43:25.387137 2099749632 layer_factory.hpp:74] Creating layer norm2
I0218 20:43:25.387145 2099749632 net.cpp:76] Creating Layer norm2
I0218 20:43:25.387152 2099749632 net.cpp:372] norm2 <- pool2
I0218 20:43:25.387161 2099749632 net.cpp:334] norm2 -> norm2
I0218 20:43:25.387168 2099749632 net.cpp:105] Setting up norm2
I0218 20:43:25.387176 2099749632 net.cpp:112] Top shape: 10 256 13 13 (432640)
I0218 20:43:25.387228 2099749632 layer_factory.hpp:74] Creating layer conv3
I0218 20:43:25.387249 2099749632 net.cpp:76] Creating Layer conv3
I0218 20:43:25.387258 2099749632 net.cpp:372] conv3 <- norm2
I0218 20:43:25.387266 2099749632 net.cpp:334] conv3 -> conv3
I0218 20:43:25.387276 2099749632 net.cpp:105] Setting up conv3
I0218 20:43:25.389375 2099749632 net.cpp:112] Top shape: 10 384 13 13 (648960)
I0218 20:43:25.389408 2099749632 layer_factory.hpp:74] Creating layer relu3
I0218 20:43:25.389421 2099749632 net.cpp:76] Creating Layer relu3
I0218 20:43:25.389430 2099749632 net.cpp:372] relu3 <- conv3
I0218 20:43:25.389438 2099749632 net.cpp:323] relu3 -> conv3 (in-place)
I0218 20:43:25.389447 2099749632 net.cpp:105] Setting up relu3
I0218 20:43:25.389456 2099749632 net.cpp:112] Top shape: 10 384 13 13 (648960)
I0218 20:43:25.389462 2099749632 layer_factory.hpp:74] Creating layer conv4
I0218 20:43:25.389472 2099749632 net.cpp:76] Creating Layer conv4
I0218 20:43:25.389478 2099749632 net.cpp:372] conv4 <- conv3
I0218 20:43:25.389487 2099749632 net.cpp:334] conv4 -> conv4
I0218 20:43:25.389497 2099749632 net.cpp:105] Setting up conv4
I0218 20:43:25.391810 2099749632 net.cpp:112] Top shape: 10 384 13 13 (648960)
I0218 20:43:25.391856 2099749632 layer_factory.hpp:74] Creating layer relu4
I0218 20:43:25.391871 2099749632 net.cpp:76] Creating Layer relu4
I0218 20:43:25.391880 2099749632 net.cpp:372] relu4 <- conv4
I0218 20:43:25.391888 2099749632 net.cpp:323] relu4 -> conv4 (in-place)
I0218 20:43:25.391898 2099749632 net.cpp:105] Setting up relu4
I0218 20:43:25.391906 2099749632 net.cpp:112] Top shape: 10 384 13 13 (648960)
I0218 20:43:25.391913 2099749632 layer_factory.hpp:74] Creating layer conv5
I0218 20:43:25.391923 2099749632 net.cpp:76] Creating Layer conv5
I0218 20:43:25.391929 2099749632 net.cpp:372] conv5 <- conv4
I0218 20:43:25.391937 2099749632 net.cpp:334] conv5 -> conv5
I0218 20:43:25.391947 2099749632 net.cpp:105] Setting up conv5
I0218 20:43:25.393072 2099749632 net.cpp:112] Top shape: 10 256 13 13 (432640)
I0218 20:43:25.393108 2099749632 layer_factory.hpp:74] Creating layer relu5
I0218 20:43:25.393122 2099749632 net.cpp:76] Creating Layer relu5
I0218 20:43:25.393129 2099749632 net.cpp:372] relu5 <- conv5
I0218 20:43:25.393138 2099749632 net.cpp:323] relu5 -> conv5 (in-place)
I0218 20:43:25.393148 2099749632 net.cpp:105] Setting up relu5
I0218 20:43:25.393157 2099749632 net.cpp:112] Top shape: 10 256 13 13 (432640)
I0218 20:43:25.393167 2099749632 layer_factory.hpp:74] Creating layer pool5
I0218 20:43:25.393175 2099749632 net.cpp:76] Creating Layer pool5
I0218 20:43:25.393182 2099749632 net.cpp:372] pool5 <- conv5
I0218 20:43:25.393190 2099749632 net.cpp:334] pool5 -> pool5
I0218 20:43:25.393199 2099749632 net.cpp:105] Setting up pool5
I0218 20:43:25.393209 2099749632 net.cpp:112] Top shape: 10 256 6 6 (92160)
I0218 20:43:25.393218 2099749632 layer_factory.hpp:74] Creating layer fc6
I0218 20:43:25.393226 2099749632 net.cpp:76] Creating Layer fc6
I0218 20:43:25.393232 2099749632 net.cpp:372] fc6 <- pool5
I0218 20:43:25.393240 2099749632 net.cpp:334] fc6 -> fc6
I0218 20:43:25.393249 2099749632 net.cpp:105] Setting up fc6
I0218 20:43:25.516396 2099749632 net.cpp:112] Top shape: 10 4096 1 1 (40960)
I0218 20:43:25.516445 2099749632 layer_factory.hpp:74] Creating layer relu6
I0218 20:43:25.516463 2099749632 net.cpp:76] Creating Layer relu6
I0218 20:43:25.516470 2099749632 net.cpp:372] relu6 <- fc6
I0218 20:43:25.516480 2099749632 net.cpp:323] relu6 -> fc6 (in-place)
I0218 20:43:25.516490 2099749632 net.cpp:105] Setting up relu6
I0218 20:43:25.516497 2099749632 net.cpp:112] Top shape: 10 4096 1 1 (40960)
I0218 20:43:25.516505 2099749632 layer_factory.hpp:74] Creating layer drop6
I0218 20:43:25.516515 2099749632 net.cpp:76] Creating Layer drop6
I0218 20:43:25.516521 2099749632 net.cpp:372] drop6 <- fc6
I0218 20:43:25.516530 2099749632 net.cpp:323] drop6 -> fc6 (in-place)
I0218 20:43:25.516538 2099749632 net.cpp:105] Setting up drop6
I0218 20:43:25.516557 2099749632 net.cpp:112] Top shape: 10 4096 1 1 (40960)
I0218 20:43:25.516566 2099749632 layer_factory.hpp:74] Creating layer fc7
I0218 20:43:25.516576 2099749632 net.cpp:76] Creating Layer fc7
I0218 20:43:25.516582 2099749632 net.cpp:372] fc7 <- fc6
I0218 20:43:25.516589 2099749632 net.cpp:334] fc7 -> fc7
I0218 20:43:25.516599 2099749632 net.cpp:105] Setting up fc7
I0218 20:43:25.604786 2099749632 net.cpp:112] Top shape: 10 4096 1 1 (40960)
I0218 20:43:25.604838 2099749632 layer_factory.hpp:74] Creating layer relu7
I0218 20:43:25.604852 2099749632 net.cpp:76] Creating Layer relu7
I0218 20:43:25.604859 2099749632 net.cpp:372] relu7 <- fc7
I0218 20:43:25.604868 2099749632 net.cpp:323] relu7 -> fc7 (in-place)
I0218 20:43:25.604878 2099749632 net.cpp:105] Setting up relu7
I0218 20:43:25.604885 2099749632 net.cpp:112] Top shape: 10 4096 1 1 (40960)
I0218 20:43:25.604893 2099749632 layer_factory.hpp:74] Creating layer drop7
I0218 20:43:25.604902 2099749632 net.cpp:76] Creating Layer drop7
I0218 20:43:25.604908 2099749632 net.cpp:372] drop7 <- fc7
I0218 20:43:25.604917 2099749632 net.cpp:323] drop7 -> fc7 (in-place)
I0218 20:43:25.604924 2099749632 net.cpp:105] Setting up drop7
I0218 20:43:25.604933 2099749632 net.cpp:112] Top shape: 10 4096 1 1 (40960)
I0218 20:43:25.604939 2099749632 layer_factory.hpp:74] Creating layer fc-rcnn
I0218 20:43:25.604948 2099749632 net.cpp:76] Creating Layer fc-rcnn
I0218 20:43:25.604954 2099749632 net.cpp:372] fc-rcnn <- fc7
I0218 20:43:25.604962 2099749632 net.cpp:334] fc-rcnn -> fc-rcnn
I0218 20:43:25.604971 2099749632 net.cpp:105] Setting up fc-rcnn
I0218 20:43:25.606878 2099749632 net.cpp:112] Top shape: 10 200 1 1 (2000)
I0218 20:43:25.606904 2099749632 net.cpp:165] fc-rcnn does not need backward computation.
I0218 20:43:25.606909 2099749632 net.cpp:165] drop7 does not need backward computation.
I0218 20:43:25.606916 2099749632 net.cpp:165] relu7 does not need backward computation.
I0218 20:43:25.606922 2099749632 net.cpp:165] fc7 does not need backward computation.
I0218 20:43:25.606928 2099749632 net.cpp:165] drop6 does not need backward computation.
I0218 20:43:25.606935 2099749632 net.cpp:165] relu6 does not need backward computation.
I0218 20:43:25.606940 2099749632 net.cpp:165] fc6 does not need backward computation.
I0218 20:43:25.606946 2099749632 net.cpp:165] pool5 does not need backward computation.
I0218 20:43:25.606952 2099749632 net.cpp:165] relu5 does not need backward computation.
I0218 20:43:25.606958 2099749632 net.cpp:165] conv5 does not need backward computation.
I0218 20:43:25.606964 2099749632 net.cpp:165] relu4 does not need backward computation.
I0218 20:43:25.606971 2099749632 net.cpp:165] conv4 does not need backward computation.
I0218 20:43:25.606976 2099749632 net.cpp:165] relu3 does not need backward computation.
I0218 20:43:25.606982 2099749632 net.cpp:165] conv3 does not need backward computation.
I0218 20:43:25.606988 2099749632 net.cpp:165] norm2 does not need backward computation.
I0218 20:43:25.606995 2099749632 net.cpp:165] pool2 does not need backward computation.
I0218 20:43:25.607002 2099749632 net.cpp:165] relu2 does not need backward computation.
I0218 20:43:25.607007 2099749632 net.cpp:165] conv2 does not need backward computation.
I0218 20:43:25.607013 2099749632 net.cpp:165] norm1 does not need backward computation.
I0218 20:43:25.607199 2099749632 net.cpp:165] pool1 does not need backward computation.
I0218 20:43:25.607213 2099749632 net.cpp:165] relu1 does not need backward computation.
I0218 20:43:25.607219 2099749632 net.cpp:165] conv1 does not need backward computation.
I0218 20:43:25.607225 2099749632 net.cpp:201] This network produces output fc-rcnn
I0218 20:43:25.607239 2099749632 net.cpp:446] Collecting Learning Rate and Weight Decay.
I0218 20:43:25.607255 2099749632 net.cpp:213] Network initialization done.
I0218 20:43:25.607262 2099749632 net.cpp:214] Memory required for data: 62425920
E0218 20:43:26.388214 2099749632 upgrade_proto.cpp:618] Attempting to upgrade input file specified using deprecated V1LayerParameter: ../models/bvlc_reference_rcnn_ilsvrc13/bvlc_reference_rcnn_ilsvrc13.caffemodel
I0218 20:43:27.089423 2099749632 upgrade_proto.cpp:626] Successfully upgraded file specified using deprecated V1LayerParameter
GPU mode
Loading input...
selective_search_rcnn({'/Users/shelhamer/h/desk/caffe/caffe-dev/examples/images/fish-bike.jpg'}, '/var/folders/bk/dtkn5qjd11bd17b2j36zplyw0000gp/T/tmpakaRLL.mat')
Processed 1570 windows in 102.895 s.
/Users/shelhamer/anaconda/lib/python2.7/site-packages/pandas/io/pytables.py:2453: PerformanceWarning: 
your performance may suffer as PyTables will pickle object types that it cannot
map directly to c-types [inferred_type->mixed,key->block1_values] [items->['prediction']]

  warnings.warn(ws, PerformanceWarning)
Saved to _temp/det_output.h5 in 0.298 s.

This run was in GPU mode. For CPU mode detection, call detect.py without the --gpu argument.

Running this outputs a DataFrame with the filenames, selected windows, and their detection scores to an HDF5 file. (We only ran on one image, so the filenames will all be the same.)

In [2]:

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline

df = pd.read_hdf('_temp/det_output.h5', 'df')
print(df.shape)
print(df.iloc[0])
(1570, 5)
prediction    [-2.62247, -2.84579, -2.85122, -3.20838, -1.94...
ymin                                                     79.846
xmin                                                       9.62
ymax                                                     246.31
xmax                                                    339.624
Name: /Users/shelhamer/h/desk/caffe/caffe-dev/examples/images/fish-bike.jpg, dtype: object

1570 regions were proposed with the R-CNN configuration of selective search. The number of proposals will vary from image to image based on its contents and size -- selective search isn't scale invariant.

In general, detect.py is most efficient when running on a lot of images: it first extracts window proposals for all of them, batches the windows for efficient GPU processing, and then outputs the results. Simply list an image per line in the images_file, and it will process all of them.

Although this guide gives an example of R-CNN ImageNet detection, detect.py is clever enough to adapt to different Caffe models’ input dimensions, batch size, and output categories. You can switch the model definition and pretrained model as desired. Refer to python detect.py --help for the parameters to describe your data set. There's no need for hardcoding.

Anyway, let's now load the ILSVRC13 detection class names and make a DataFrame of the predictions. Note you'll need the auxiliary ilsvrc2012 data fetched by data/ilsvrc12/get_ilsvrc12_aux.sh.

In [3]:

with open('../data/ilsvrc12/det_synset_words.txt') as f:
    labels_df = pd.DataFrame([
        {
            'synset_id': l.strip().split(' ')[0],
            'name': ' '.join(l.strip().split(' ')[1:]).split(',')[0]
        }
        for l in f.readlines()
    ])
labels_df.sort('synset_id')
predictions_df = pd.DataFrame(np.vstack(df.prediction.values), columns=labels_df['name'])
print(predictions_df.iloc[0])
name
accordion      -2.622471
airplane       -2.845788
ant            -2.851219
antelope       -3.208377
apple          -1.949950
armadillo      -2.472935
artichoke      -2.201684
axe            -2.327404
baby bed       -2.737925
backpack       -2.176763
bagel          -2.681061
balance beam   -2.722538
banana         -2.390628
band aid       -1.598909
banjo          -2.298197
...
trombone        -2.582361
trumpet         -2.352853
turtle          -2.360859
tv or monitor   -2.761043
unicycle        -2.218467
vacuum          -1.907717
violin          -2.757079
volleyball      -2.723689
waffle iron     -2.418540
washer          -2.408994
water bottle    -2.174899
watercraft      -2.837425
whale           -3.120338
wine bottle     -2.772960
zebra           -2.742913
Name: 0, Length: 200, dtype: float32

Let's look at the activations.

In [4]:

plt.gray()
plt.matshow(predictions_df.values)
plt.xlabel('Classes')
plt.ylabel('Windows')

Out[4]:

<matplotlib.text.Text at 0x114f15f90>
<matplotlib.figure.Figure at 0x114254b50>

Now let's take max across all windows and plot the top classes.

In [5]:

max_s = predictions_df.max(0)
max_s.sort(ascending=False)
print(max_s[:10])
name
person          1.835771
bicycle         0.866110
unicycle        0.057080
motorcycle     -0.006122
banjo          -0.028209
turtle         -0.189831
electric fan   -0.206788
cart           -0.214235
lizard         -0.393519
helmet         -0.477942
dtype: float32

The top detections are in fact a person and bicycle. Picking good localizations is a work in progress; we pick the top-scoring person and bicycle detections.

In [6]:

# Find, print, and display the top detections: person and bicycle.
i = predictions_df['person'].argmax()
j = predictions_df['bicycle'].argmax()

# Show top predictions for top detection.
f = pd.Series(df['prediction'].iloc[i], index=labels_df['name'])
print('Top detection:')
print(f.order(ascending=False)[:5])
print('')

# Show top predictions for second-best detection.
f = pd.Series(df['prediction'].iloc[j], index=labels_df['name'])
print('Second-best detection:')
print(f.order(ascending=False)[:5])

# Show top detection in red, second-best top detection in blue.
im = plt.imread('images/fish-bike.jpg')
plt.imshow(im)
currentAxis = plt.gca()

det = df.iloc[i]
coords = (det['xmin'], det['ymin']), det['xmax'] - det['xmin'], det['ymax'] - det['ymin']
currentAxis.add_patch(plt.Rectangle(*coords, fill=False, edgecolor='r', linewidth=5))

det = df.iloc[j]
coords = (det['xmin'], det['ymin']), det['xmax'] - det['xmin'], det['ymax'] - det['ymin']
currentAxis.add_patch(plt.Rectangle(*coords, fill=False, edgecolor='b', linewidth=5))
Top detection:
name
person             1.835771
swimming trunks   -1.150371
rubber eraser     -1.231106
turtle            -1.266037
plastic bag       -1.303265
dtype: float32

Second-best detection:
name
bicycle     0.866110
unicycle   -0.359139
scorpion   -0.811621
lobster    -0.982891
lamp       -1.096808
dtype: float32

Out[6]:

<matplotlib.patches.Rectangle at 0x118576a90>

That's cool. Let's take all 'bicycle' detections and NMS them to get rid of overlapping windows.

In [7]:

def nms_detections(dets, overlap=0.3):
    """
    Non-maximum suppression: Greedily select high-scoring detections and
    skip detections that are significantly covered by a previously
    selected detection.

    This version is translated from Matlab code by Tomasz Malisiewicz,
    who sped up Pedro Felzenszwalb's code.

    Parameters
    ----------
    dets: ndarray
        each row is ['xmin', 'ymin', 'xmax', 'ymax', 'score']
    overlap: float
        minimum overlap ratio (0.3 default)

    Output
    ------
    dets: ndarray
        remaining after suppression.
    """
    x1 = dets[:, 0]
    y1 = dets[:, 1]
    x2 = dets[:, 2]
    y2 = dets[:, 3]
    ind = np.argsort(dets[:, 4])

    w = x2 - x1
    h = y2 - y1
    area = (w * h).astype(float)

    pick = []
    while len(ind) > 0:
        i = ind[-1]
        pick.append(i)
        ind = ind[:-1]

        xx1 = np.maximum(x1[i], x1[ind])
        yy1 = np.maximum(y1[i], y1[ind])
        xx2 = np.minimum(x2[i], x2[ind])
        yy2 = np.minimum(y2[i], y2[ind])

        w = np.maximum(0., xx2 - xx1)
        h = np.maximum(0., yy2 - yy1)

        wh = w * h
        o = wh / (area[i] + area[ind] - wh)

        ind = ind[np.nonzero(o <= overlap)[0]]

    return dets[pick, :]

In [8]:

scores = predictions_df['bicycle']
windows = df[['xmin', 'ymin', 'xmax', 'ymax']].values
dets = np.hstack((windows, scores[:, np.newaxis]))
nms_dets = nms_detections(dets)

Show top 3 NMS'd detections for 'bicycle' in the image and note the gap between the top scoring box (red) and the remaining boxes.

In [9]:

plt.imshow(im)
currentAxis = plt.gca()
colors = ['r', 'b', 'y']
for c, det in zip(colors, nms_dets[:3]):
    currentAxis.add_patch(
        plt.Rectangle((det[0], det[1]), det[2]-det[0], det[3]-det[1],
        fill=False, edgecolor=c, linewidth=5)
    )
print 'scores:', nms_dets[:3, 4]
scores: [ 0.86610985 -0.70051557 -1.34796357]

This was an easy instance for bicycle as it was in the class's training set. However, the person result is a true detection since this was not in the set for that class.

You should try out detection on an image of your own next!

(Remove the temp directory to clean up, and we're done.)

In [10]:

!rm -rf _temp

R-CNN is a state-of-the-art detector that classifies region proposals by a finetuned Caffe model. For the full details of the R-CNN system and model, refer to its project site and the paper:

Rich feature hierarchies for accurate object detection and semantic segmentation. Ross Girshick, Jeff Donahue, Trevor Darrell, Jitendra Malik. CVPR 2014. Arxiv 2013.

In this example, we do detection by a pure Caffe edition of the R-CNN model for ImageNet. The R-CNN detector outputs class scores for the 200 detection classes of ILSVRC13. Keep in mind that these are raw one vs. all SVM scores, so they are not probabilistically calibrated or exactly comparable across classes. Note that this off-the-shelf model is simply for convenience, and is not the full R-CNN model.

Let's run detection on an image of a bicyclist riding a fish bike in the desert (from the ImageNet challenge—no joke).

First, we'll need region proposals and the Caffe R-CNN ImageNet model:

  • Selective Search is the region proposer used by R-CNN. The selective_search_ijcv_with_python Python module takes care of extracting proposals through the selective search MATLAB implementation. To install it, download the module and name its directory selective_search_ijcv_with_python, run the demo in MATLAB to compile the necessary functions, then add it to your PYTHONPATH for importing. (If you have your own region proposals prepared, or would rather not bother with this step, detect.py accepts a list of images and bounding boxes as CSV.)

-Run ./scripts/download_model_binary.py models/bvlc_reference_rcnn_ilsvrc13 to get the Caffe R-CNN ImageNet model.

With that done, we'll call the bundled detect.py to generate the region proposals and run the network. For an explanation of the arguments, do ./detect.py --help.

In [1]:

!mkdir -p _temp
!echo `pwd`/images/fish-bike.jpg > _temp/det_input.txt
!../python/detect.py --crop_mode=selective_search --pretrained_model=../models/bvlc_reference_rcnn_ilsvrc13/bvlc_reference_rcnn_ilsvrc13.caffemodel --model_def=../models/bvlc_reference_rcnn_ilsvrc13/deploy.prototxt --gpu --raw_scale=255 _temp/det_input.txt _temp/det_output.h5
WARNING: Logging before InitGoogleLogging() is written to STDERR
I0218 20:43:25.383932 2099749632 net.cpp:42] Initializing net from parameters: 
name: "R-CNN-ilsvrc13"
input: "data"
input_dim: 10
input_dim: 3
input_dim: 227
input_dim: 227
state {
  phase: TEST
}
layer {
  name: "conv1"
  type: "Convolution"
  bottom: "data"
  top: "conv1"
  convolution_param {
    num_output: 96
    kernel_size: 11
    stride: 4
  }
}
layer {
  name: "relu1"
  type: "ReLU"
  bottom: "conv1"
  top: "conv1"
}
layer {
  name: "pool1"
  type: "Pooling"
  bottom: "conv1"
  top: "pool1"
  pooling_param {
    pool: MAX
    kernel_size: 3
    stride: 2
  }
}
layer {
  name: "norm1"
  type: "LRN"
  bottom: "pool1"
  top: "norm1"
  lrn_param {
    local_size: 5
    alpha: 0.0001
    beta: 0.75
  }
}
layer {
  name: "conv2"
  type: "Convolution"
  bottom: "norm1"
  top: "conv2"
  convolution_param {
    num_output: 256
    pad: 2
    kernel_size: 5
    group: 2
  }
}
layer {
  name: "relu2"
  type: "ReLU"
  bottom: "conv2"
  top: "conv2"
}
layer {
  name: "pool2"
  type: "Pooling"
  bottom: "conv2"
  top: "pool2"
  pooling_param {
    pool: MAX
    kernel_size: 3
    stride: 2
  }
}
layer {
  name: "norm2"
  type: "LRN"
  bottom: "pool2"
  top: "norm2"
  lrn_param {
    local_size: 5
    alpha: 0.0001
    beta: 0.75
  }
}
layer {
  name: "conv3"
  type: "Convolution"
  bottom: "norm2"
  top: "conv3"
  convolution_param {
    num_output: 384
    pad: 1
    kernel_size: 3
  }
}
layer {
  name: "relu3"
  type: "ReLU"
  bottom: "conv3"
  top: "conv3"
}
layer {
  name: "conv4"
  type: "Convolution"
  bottom: "conv3"
  top: "conv4"
  convolution_param {
    num_output: 384
    pad: 1
    kernel_size: 3
    group: 2
  }
}
layer {
  name: "relu4"
  type: "ReLU"
  bottom: "conv4"
  top: "conv4"
}
layer {
  name: "conv5"
  type: "Convolution"
  bottom: "conv4"
  top: "conv5"
  convolution_param {
    num_output: 256
    pad: 1
    kernel_size: 3
    group: 2
  }
}
layer {
  name: "relu5"
  type: "ReLU"
  bottom: "conv5"
  top: "conv5"
}
layer {
  name: "pool5"
  type: "Pooling"
  bottom: "conv5"
  top: "pool5"
  pooling_param {
    pool: MAX
    kernel_size: 3
    stride: 2
  }
}
layer {
  name: "fc6"
  type: "InnerProduct"
  bottom: "pool5"
  top: "fc6"
  inner_product_param {
    num_output: 4096
  }
}
layer {
  name: "relu6"
  type: "ReLU"
  bottom: "fc6"
  top: "fc6"
}
layer {
  name: "drop6"
  type: "Dropout"
  bottom: "fc6"
  top: "fc6"
  dropout_param {
    dropout_ratio: 0.5
  }
}
layer {
  name: "fc7"
  type: "InnerProduct"
  bottom: "fc6"
  top: "fc7"
  inner_product_param {
    num_output: 4096
  }
}
layer {
  name: "relu7"
  type: "ReLU"
  bottom: "fc7"
  top: "fc7"
}
layer {
  name: "drop7"
  type: "Dropout"
  bottom: "fc7"
  top: "fc7"
  dropout_param {
    dropout_ratio: 0.5
  }
}
layer {
  name: "fc-rcnn"
  type: "InnerProduct"
  bottom: "fc7"
  top: "fc-rcnn"
  inner_product_param {
    num_output: 200
  }
}
I0218 20:43:25.385720 2099749632 net.cpp:336] Input 0 -> data
I0218 20:43:25.385769 2099749632 layer_factory.hpp:74] Creating layer conv1
I0218 20:43:25.385783 2099749632 net.cpp:76] Creating Layer conv1
I0218 20:43:25.385790 2099749632 net.cpp:372] conv1 <- data
I0218 20:43:25.385802 2099749632 net.cpp:334] conv1 -> conv1
I0218 20:43:25.385815 2099749632 net.cpp:105] Setting up conv1
I0218 20:43:25.386574 2099749632 net.cpp:112] Top shape: 10 96 55 55 (2904000)
I0218 20:43:25.386610 2099749632 layer_factory.hpp:74] Creating layer relu1
I0218 20:43:25.386625 2099749632 net.cpp:76] Creating Layer relu1
I0218 20:43:25.386631 2099749632 net.cpp:372] relu1 <- conv1
I0218 20:43:25.386641 2099749632 net.cpp:323] relu1 -> conv1 (in-place)
I0218 20:43:25.386649 2099749632 net.cpp:105] Setting up relu1
I0218 20:43:25.386656 2099749632 net.cpp:112] Top shape: 10 96 55 55 (2904000)
I0218 20:43:25.386663 2099749632 layer_factory.hpp:74] Creating layer pool1
I0218 20:43:25.386675 2099749632 net.cpp:76] Creating Layer pool1
I0218 20:43:25.386682 2099749632 net.cpp:372] pool1 <- conv1
I0218 20:43:25.386690 2099749632 net.cpp:334] pool1 -> pool1
I0218 20:43:25.386699 2099749632 net.cpp:105] Setting up pool1
I0218 20:43:25.386716 2099749632 net.cpp:112] Top shape: 10 96 27 27 (699840)
I0218 20:43:25.386725 2099749632 layer_factory.hpp:74] Creating layer norm1
I0218 20:43:25.386736 2099749632 net.cpp:76] Creating Layer norm1
I0218 20:43:25.386744 2099749632 net.cpp:372] norm1 <- pool1
I0218 20:43:25.386803 2099749632 net.cpp:334] norm1 -> norm1
I0218 20:43:25.386819 2099749632 net.cpp:105] Setting up norm1
I0218 20:43:25.386832 2099749632 net.cpp:112] Top shape: 10 96 27 27 (699840)
I0218 20:43:25.386842 2099749632 layer_factory.hpp:74] Creating layer conv2
I0218 20:43:25.386852 2099749632 net.cpp:76] Creating Layer conv2
I0218 20:43:25.386865 2099749632 net.cpp:372] conv2 <- norm1
I0218 20:43:25.386878 2099749632 net.cpp:334] conv2 -> conv2
I0218 20:43:25.386899 2099749632 net.cpp:105] Setting up conv2
I0218 20:43:25.387024 2099749632 net.cpp:112] Top shape: 10 256 27 27 (1866240)
I0218 20:43:25.387042 2099749632 layer_factory.hpp:74] Creating layer relu2
I0218 20:43:25.387050 2099749632 net.cpp:76] Creating Layer relu2
I0218 20:43:25.387058 2099749632 net.cpp:372] relu2 <- conv2
I0218 20:43:25.387066 2099749632 net.cpp:323] relu2 -> conv2 (in-place)
I0218 20:43:25.387075 2099749632 net.cpp:105] Setting up relu2
I0218 20:43:25.387081 2099749632 net.cpp:112] Top shape: 10 256 27 27 (1866240)
I0218 20:43:25.387089 2099749632 layer_factory.hpp:74] Creating layer pool2
I0218 20:43:25.387097 2099749632 net.cpp:76] Creating Layer pool2
I0218 20:43:25.387104 2099749632 net.cpp:372] pool2 <- conv2
I0218 20:43:25.387112 2099749632 net.cpp:334] pool2 -> pool2
I0218 20:43:25.387121 2099749632 net.cpp:105] Setting up pool2
I0218 20:43:25.387130 2099749632 net.cpp:112] Top shape: 10 256 13 13 (432640)
I0218 20:43:25.387137 2099749632 layer_factory.hpp:74] Creating layer norm2
I0218 20:43:25.387145 2099749632 net.cpp:76] Creating Layer norm2
I0218 20:43:25.387152 2099749632 net.cpp:372] norm2 <- pool2
I0218 20:43:25.387161 2099749632 net.cpp:334] norm2 -> norm2
I0218 20:43:25.387168 2099749632 net.cpp:105] Setting up norm2
I0218 20:43:25.387176 2099749632 net.cpp:112] Top shape: 10 256 13 13 (432640)
I0218 20:43:25.387228 2099749632 layer_factory.hpp:74] Creating layer conv3
I0218 20:43:25.387249 2099749632 net.cpp:76] Creating Layer conv3
I0218 20:43:25.387258 2099749632 net.cpp:372] conv3 <- norm2
I0218 20:43:25.387266 2099749632 net.cpp:334] conv3 -> conv3
I0218 20:43:25.387276 2099749632 net.cpp:105] Setting up conv3
I0218 20:43:25.389375 2099749632 net.cpp:112] Top shape: 10 384 13 13 (648960)
I0218 20:43:25.389408 2099749632 layer_factory.hpp:74] Creating layer relu3
I0218 20:43:25.389421 2099749632 net.cpp:76] Creating Layer relu3
I0218 20:43:25.389430 2099749632 net.cpp:372] relu3 <- conv3
I0218 20:43:25.389438 2099749632 net.cpp:323] relu3 -> conv3 (in-place)
I0218 20:43:25.389447 2099749632 net.cpp:105] Setting up relu3
I0218 20:43:25.389456 2099749632 net.cpp:112] Top shape: 10 384 13 13 (648960)
I0218 20:43:25.389462 2099749632 layer_factory.hpp:74] Creating layer conv4
I0218 20:43:25.389472 2099749632 net.cpp:76] Creating Layer conv4
I0218 20:43:25.389478 2099749632 net.cpp:372] conv4 <- conv3
I0218 20:43:25.389487 2099749632 net.cpp:334] conv4 -> conv4
I0218 20:43:25.389497 2099749632 net.cpp:105] Setting up conv4
I0218 20:43:25.391810 2099749632 net.cpp:112] Top shape: 10 384 13 13 (648960)
I0218 20:43:25.391856 2099749632 layer_factory.hpp:74] Creating layer relu4
I0218 20:43:25.391871 2099749632 net.cpp:76] Creating Layer relu4
I0218 20:43:25.391880 2099749632 net.cpp:372] relu4 <- conv4
I0218 20:43:25.391888 2099749632 net.cpp:323] relu4 -> conv4 (in-place)
I0218 20:43:25.391898 2099749632 net.cpp:105] Setting up relu4
I0218 20:43:25.391906 2099749632 net.cpp:112] Top shape: 10 384 13 13 (648960)
I0218 20:43:25.391913 2099749632 layer_factory.hpp:74] Creating layer conv5
I0218 20:43:25.391923 2099749632 net.cpp:76] Creating Layer conv5
I0218 20:43:25.391929 2099749632 net.cpp:372] conv5 <- conv4
I0218 20:43:25.391937 2099749632 net.cpp:334] conv5 -> conv5
I0218 20:43:25.391947 2099749632 net.cpp:105] Setting up conv5
I0218 20:43:25.393072 2099749632 net.cpp:112] Top shape: 10 256 13 13 (432640)
I0218 20:43:25.393108 2099749632 layer_factory.hpp:74] Creating layer relu5
I0218 20:43:25.393122 2099749632 net.cpp:76] Creating Layer relu5
I0218 20:43:25.393129 2099749632 net.cpp:372] relu5 <- conv5
I0218 20:43:25.393138 2099749632 net.cpp:323] relu5 -> conv5 (in-place)
I0218 20:43:25.393148 2099749632 net.cpp:105] Setting up relu5
I0218 20:43:25.393157 2099749632 net.cpp:112] Top shape: 10 256 13 13 (432640)
I0218 20:43:25.393167 2099749632 layer_factory.hpp:74] Creating layer pool5
I0218 20:43:25.393175 2099749632 net.cpp:76] Creating Layer pool5
I0218 20:43:25.393182 2099749632 net.cpp:372] pool5 <- conv5
I0218 20:43:25.393190 2099749632 net.cpp:334] pool5 -> pool5
I0218 20:43:25.393199 2099749632 net.cpp:105] Setting up pool5
I0218 20:43:25.393209 2099749632 net.cpp:112] Top shape: 10 256 6 6 (92160)
I0218 20:43:25.393218 2099749632 layer_factory.hpp:74] Creating layer fc6
I0218 20:43:25.393226 2099749632 net.cpp:76] Creating Layer fc6
I0218 20:43:25.393232 2099749632 net.cpp:372] fc6 <- pool5
I0218 20:43:25.393240 2099749632 net.cpp:334] fc6 -> fc6
I0218 20:43:25.393249 2099749632 net.cpp:105] Setting up fc6
I0218 20:43:25.516396 2099749632 net.cpp:112] Top shape: 10 4096 1 1 (40960)
I0218 20:43:25.516445 2099749632 layer_factory.hpp:74] Creating layer relu6
I0218 20:43:25.516463 2099749632 net.cpp:76] Creating Layer relu6
I0218 20:43:25.516470 2099749632 net.cpp:372] relu6 <- fc6
I0218 20:43:25.516480 2099749632 net.cpp:323] relu6 -> fc6 (in-place)
I0218 20:43:25.516490 2099749632 net.cpp:105] Setting up relu6
I0218 20:43:25.516497 2099749632 net.cpp:112] Top shape: 10 4096 1 1 (40960)
I0218 20:43:25.516505 2099749632 layer_factory.hpp:74] Creating layer drop6
I0218 20:43:25.516515 2099749632 net.cpp:76] Creating Layer drop6
I0218 20:43:25.516521 2099749632 net.cpp:372] drop6 <- fc6
I0218 20:43:25.516530 2099749632 net.cpp:323] drop6 -> fc6 (in-place)
I0218 20:43:25.516538 2099749632 net.cpp:105] Setting up drop6
I0218 20:43:25.516557 2099749632 net.cpp:112] Top shape: 10 4096 1 1 (40960)
I0218 20:43:25.516566 2099749632 layer_factory.hpp:74] Creating layer fc7
I0218 20:43:25.516576 2099749632 net.cpp:76] Creating Layer fc7
I0218 20:43:25.516582 2099749632 net.cpp:372] fc7 <- fc6
I0218 20:43:25.516589 2099749632 net.cpp:334] fc7 -> fc7
I0218 20:43:25.516599 2099749632 net.cpp:105] Setting up fc7
I0218 20:43:25.604786 2099749632 net.cpp:112] Top shape: 10 4096 1 1 (40960)
I0218 20:43:25.604838 2099749632 layer_factory.hpp:74] Creating layer relu7
I0218 20:43:25.604852 2099749632 net.cpp:76] Creating Layer relu7
I0218 20:43:25.604859 2099749632 net.cpp:372] relu7 <- fc7
I0218 20:43:25.604868 2099749632 net.cpp:323] relu7 -> fc7 (in-place)
I0218 20:43:25.604878 2099749632 net.cpp:105] Setting up relu7
I0218 20:43:25.604885 2099749632 net.cpp:112] Top shape: 10 4096 1 1 (40960)
I0218 20:43:25.604893 2099749632 layer_factory.hpp:74] Creating layer drop7
I0218 20:43:25.604902 2099749632 net.cpp:76] Creating Layer drop7
I0218 20:43:25.604908 2099749632 net.cpp:372] drop7 <- fc7
I0218 20:43:25.604917 2099749632 net.cpp:323] drop7 -> fc7 (in-place)
I0218 20:43:25.604924 2099749632 net.cpp:105] Setting up drop7
I0218 20:43:25.604933 2099749632 net.cpp:112] Top shape: 10 4096 1 1 (40960)
I0218 20:43:25.604939 2099749632 layer_factory.hpp:74] Creating layer fc-rcnn
I0218 20:43:25.604948 2099749632 net.cpp:76] Creating Layer fc-rcnn
I0218 20:43:25.604954 2099749632 net.cpp:372] fc-rcnn <- fc7
I0218 20:43:25.604962 2099749632 net.cpp:334] fc-rcnn -> fc-rcnn
I0218 20:43:25.604971 2099749632 net.cpp:105] Setting up fc-rcnn
I0218 20:43:25.606878 2099749632 net.cpp:112] Top shape: 10 200 1 1 (2000)
I0218 20:43:25.606904 2099749632 net.cpp:165] fc-rcnn does not need backward computation.
I0218 20:43:25.606909 2099749632 net.cpp:165] drop7 does not need backward computation.
I0218 20:43:25.606916 2099749632 net.cpp:165] relu7 does not need backward computation.
I0218 20:43:25.606922 2099749632 net.cpp:165] fc7 does not need backward computation.
I0218 20:43:25.606928 2099749632 net.cpp:165] drop6 does not need backward computation.
I0218 20:43:25.606935 2099749632 net.cpp:165] relu6 does not need backward computation.
I0218 20:43:25.606940 2099749632 net.cpp:165] fc6 does not need backward computation.
I0218 20:43:25.606946 2099749632 net.cpp:165] pool5 does not need backward computation.
I0218 20:43:25.606952 2099749632 net.cpp:165] relu5 does not need backward computation.
I0218 20:43:25.606958 2099749632 net.cpp:165] conv5 does not need backward computation.
I0218 20:43:25.606964 2099749632 net.cpp:165] relu4 does not need backward computation.
I0218 20:43:25.606971 2099749632 net.cpp:165] conv4 does not need backward computation.
I0218 20:43:25.606976 2099749632 net.cpp:165] relu3 does not need backward computation.
I0218 20:43:25.606982 2099749632 net.cpp:165] conv3 does not need backward computation.
I0218 20:43:25.606988 2099749632 net.cpp:165] norm2 does not need backward computation.
I0218 20:43:25.606995 2099749632 net.cpp:165] pool2 does not need backward computation.
I0218 20:43:25.607002 2099749632 net.cpp:165] relu2 does not need backward computation.
I0218 20:43:25.607007 2099749632 net.cpp:165] conv2 does not need backward computation.
I0218 20:43:25.607013 2099749632 net.cpp:165] norm1 does not need backward computation.
I0218 20:43:25.607199 2099749632 net.cpp:165] pool1 does not need backward computation.
I0218 20:43:25.607213 2099749632 net.cpp:165] relu1 does not need backward computation.
I0218 20:43:25.607219 2099749632 net.cpp:165] conv1 does not need backward computation.
I0218 20:43:25.607225 2099749632 net.cpp:201] This network produces output fc-rcnn
I0218 20:43:25.607239 2099749632 net.cpp:446] Collecting Learning Rate and Weight Decay.
I0218 20:43:25.607255 2099749632 net.cpp:213] Network initialization done.
I0218 20:43:25.607262 2099749632 net.cpp:214] Memory required for data: 62425920
E0218 20:43:26.388214 2099749632 upgrade_proto.cpp:618] Attempting to upgrade input file specified using deprecated V1LayerParameter: ../models/bvlc_reference_rcnn_ilsvrc13/bvlc_reference_rcnn_ilsvrc13.caffemodel
I0218 20:43:27.089423 2099749632 upgrade_proto.cpp:626] Successfully upgraded file specified using deprecated V1LayerParameter
GPU mode
Loading input...
selective_search_rcnn({'/Users/shelhamer/h/desk/caffe/caffe-dev/examples/images/fish-bike.jpg'}, '/var/folders/bk/dtkn5qjd11bd17b2j36zplyw0000gp/T/tmpakaRLL.mat')
Processed 1570 windows in 102.895 s.
/Users/shelhamer/anaconda/lib/python2.7/site-packages/pandas/io/pytables.py:2453: PerformanceWarning: 
your performance may suffer as PyTables will pickle object types that it cannot
map directly to c-types [inferred_type->mixed,key->block1_values] [items->['prediction']]

  warnings.warn(ws, PerformanceWarning)
Saved to _temp/det_output.h5 in 0.298 s.

This run was in GPU mode. For CPU mode detection, call detect.py without the --gpu argument.

Running this outputs a DataFrame with the filenames, selected windows, and their detection scores to an HDF5 file. (We only ran on one image, so the filenames will all be the same.)

In [2]:

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline

df = pd.read_hdf('_temp/det_output.h5', 'df')
print(df.shape)
print(df.iloc[0])
(1570, 5)
prediction    [-2.62247, -2.84579, -2.85122, -3.20838, -1.94...
ymin                                                     79.846
xmin                                                       9.62
ymax                                                     246.31
xmax                                                    339.624
Name: /Users/shelhamer/h/desk/caffe/caffe-dev/examples/images/fish-bike.jpg, dtype: object

1570 regions were proposed with the R-CNN configuration of selective search. The number of proposals will vary from image to image based on its contents and size -- selective search isn't scale invariant.

In general, detect.py is most efficient when running on a lot of images: it first extracts window proposals for all of them, batches the windows for efficient GPU processing, and then outputs the results. Simply list an image per line in the images_file, and it will process all of them.

Although this guide gives an example of R-CNN ImageNet detection, detect.py is clever enough to adapt to different Caffe models’ input dimensions, batch size, and output categories. You can switch the model definition and pretrained model as desired. Refer to python detect.py --help for the parameters to describe your data set. There's no need for hardcoding.

Anyway, let's now load the ILSVRC13 detection class names and make a DataFrame of the predictions. Note you'll need the auxiliary ilsvrc2012 data fetched by data/ilsvrc12/get_ilsvrc12_aux.sh.

In [3]:

with open('../data/ilsvrc12/det_synset_words.txt') as f:
    labels_df = pd.DataFrame([
        {
            'synset_id': l.strip().split(' ')[0],
            'name': ' '.join(l.strip().split(' ')[1:]).split(',')[0]
        }
        for l in f.readlines()
    ])
labels_df.sort('synset_id')
predictions_df = pd.DataFrame(np.vstack(df.prediction.values), columns=labels_df['name'])
print(predictions_df.iloc[0])
name
accordion      -2.622471
airplane       -2.845788
ant            -2.851219
antelope       -3.208377
apple          -1.949950
armadillo      -2.472935
artichoke      -2.201684
axe            -2.327404
baby bed       -2.737925
backpack       -2.176763
bagel          -2.681061
balance beam   -2.722538
banana         -2.390628
band aid       -1.598909
banjo          -2.298197
...
trombone        -2.582361
trumpet         -2.352853
turtle          -2.360859
tv or monitor   -2.761043
unicycle        -2.218467
vacuum          -1.907717
violin          -2.757079
volleyball      -2.723689
waffle iron     -2.418540
washer          -2.408994
water bottle    -2.174899
watercraft      -2.837425
whale           -3.120338
wine bottle     -2.772960
zebra           -2.742913
Name: 0, Length: 200, dtype: float32

Let's look at the activations.

In [4]:

plt.gray()
plt.matshow(predictions_df.values)
plt.xlabel('Classes')
plt.ylabel('Windows')

Out[4]:

<matplotlib.text.Text at 0x114f15f90>
<matplotlib.figure.Figure at 0x114254b50>

Now let's take max across all windows and plot the top classes.

In [5]:

max_s = predictions_df.max(0)
max_s.sort(ascending=False)
print(max_s[:10])
name
person          1.835771
bicycle         0.866110
unicycle        0.057080
motorcycle     -0.006122
banjo          -0.028209
turtle         -0.189831
electric fan   -0.206788
cart           -0.214235
lizard         -0.393519
helmet         -0.477942
dtype: float32

The top detections are in fact a person and bicycle. Picking good localizations is a work in progress; we pick the top-scoring person and bicycle detections.

In [6]:

# Find, print, and display the top detections: person and bicycle.
i = predictions_df['person'].argmax()
j = predictions_df['bicycle'].argmax()

# Show top predictions for top detection.
f = pd.Series(df['prediction'].iloc[i], index=labels_df['name'])
print('Top detection:')
print(f.order(ascending=False)[:5])
print('')

# Show top predictions for second-best detection.
f = pd.Series(df['prediction'].iloc[j], index=labels_df['name'])
print('Second-best detection:')
print(f.order(ascending=False)[:5])

# Show top detection in red, second-best top detection in blue.
im = plt.imread('images/fish-bike.jpg')
plt.imshow(im)
currentAxis = plt.gca()

det = df.iloc[i]
coords = (det['xmin'], det['ymin']), det['xmax'] - det['xmin'], det['ymax'] - det['ymin']
currentAxis.add_patch(plt.Rectangle(*coords, fill=False, edgecolor='r', linewidth=5))

det = df.iloc[j]
coords = (det['xmin'], det['ymin']), det['xmax'] - det['xmin'], det['ymax'] - det['ymin']
currentAxis.add_patch(plt.Rectangle(*coords, fill=False, edgecolor='b', linewidth=5))
Top detection:
name
person             1.835771
swimming trunks   -1.150371
rubber eraser     -1.231106
turtle            -1.266037
plastic bag       -1.303265
dtype: float32

Second-best detection:
name
bicycle     0.866110
unicycle   -0.359139
scorpion   -0.811621
lobster    -0.982891
lamp       -1.096808
dtype: float32

Out[6]:

<matplotlib.patches.Rectangle at 0x118576a90>

That's cool. Let's take all 'bicycle' detections and NMS them to get rid of overlapping windows.

In [7]:

def nms_detections(dets, overlap=0.3):
    """
    Non-maximum suppression: Greedily select high-scoring detections and
    skip detections that are significantly covered by a previously
    selected detection.

    This version is translated from Matlab code by Tomasz Malisiewicz,
    who sped up Pedro Felzenszwalb's code.

    Parameters
    ----------
    dets: ndarray
        each row is ['xmin', 'ymin', 'xmax', 'ymax', 'score']
    overlap: float
        minimum overlap ratio (0.3 default)

    Output
    ------
    dets: ndarray
        remaining after suppression.
    """
    x1 = dets[:, 0]
    y1 = dets[:, 1]
    x2 = dets[:, 2]
    y2 = dets[:, 3]
    ind = np.argsort(dets[:, 4])

    w = x2 - x1
    h = y2 - y1
    area = (w * h).astype(float)

    pick = []
    while len(ind) > 0:
        i = ind[-1]
        pick.append(i)
        ind = ind[:-1]

        xx1 = np.maximum(x1[i], x1[ind])
        yy1 = np.maximum(y1[i], y1[ind])
        xx2 = np.minimum(x2[i], x2[ind])
        yy2 = np.minimum(y2[i], y2[ind])

        w = np.maximum(0., xx2 - xx1)
        h = np.maximum(0., yy2 - yy1)

        wh = w * h
        o = wh / (area[i] + area[ind] - wh)

        ind = ind[np.nonzero(o <= overlap)[0]]

    return dets[pick, :]

In [8]:

scores = predictions_df['bicycle']
windows = df[['xmin', 'ymin', 'xmax', 'ymax']].values
dets = np.hstack((windows, scores[:, np.newaxis]))
nms_dets = nms_detections(dets)

Show top 3 NMS'd detections for 'bicycle' in the image and note the gap between the top scoring box (red) and the remaining boxes.

In [9]:

plt.imshow(im)
currentAxis = plt.gca()
colors = ['r', 'b', 'y']
for c, det in zip(colors, nms_dets[:3]):
    currentAxis.add_patch(
        plt.Rectangle((det[0], det[1]), det[2]-det[0], det[3]-det[1],
        fill=False, edgecolor=c, linewidth=5)
    )
print 'scores:', nms_dets[:3, 4]
scores: [ 0.86610985 -0.70051557 -1.34796357]

This was an easy instance for bicycle as it was in the class's training set. However, the person result is a true detection since this was not in the set for that class.

You should try out detection on an image of your own next!

(Remove the temp directory to clean up, and we're done.)

In [10]:

!rm -rf _temp
本文内容由网友自发贡献,版权归原作者所有,本站不承担相应法律责任。如您发现有涉嫌抄袭侵权的内容,请联系:hwhale#tublm.com(使用前将#替换为@)

【Caffe】官方例程之R-CNN物体检测 的相关文章

  • 利用lda对文本进行分类_使用lda进行文本分类

    利用lda对文本进行分类 LDA or Latent Dirichlet Allocation is one of the most widely used topic modelling algorithms It is scalable
  • Linux——gcc/g++以及make/Makefile的使用

    简介 在Linux的系统中 想要完成代码编译 gcc g 是不可缺少的工具 而make Makefile能否熟练应用则从一个侧面体现出一个人是否有能力独自完成一个大型工程 而本篇文章就带领大家了解一些gcc g 和make Makefile
  • 层叠上下文(stacking context)

    一 什么是层叠式上下文 层叠上下文 是HTML中的一个三维概念 如果元素具备以下任何一个条件 则该元素会创建一个新的层叠上下文 根元素 z index不为auto的定位元素 二 什么是层叠级别 同一个层叠上下文的背景色以及内部元素 谁在上谁
  • IPsec SA 创建步骤——IKE协议

    IPsec SA 创建步骤概述 IPsec SA creation steps There are two steps on the IPsec SA creation phase 1 is to creat IKE SA and phas
  • 为什么重写equals方法时必须重写hashcode方法

    文章目录 1 与 equals的区别 2 重写equals 3 为什么重写equals方法时必须重写hashcode方法 3 1 Hash算法 3 2 HashCode 相关文章 为什么重写equals方法时必须重写hashcode方法 J
  • 在vue页面中,直接展示代码及样式高亮(vue 中使用 highlight)

    参考链接 https blog csdn net u011364720 article details 90417302 前言 效果如下 想要在前端页面中 直接展示代码的样式 就像一些开发文档的源码展示一样 使用插件 highlight j
  • C++智能指针 shared_ptr,unique_ptr和weak_ptr

    1 智能指针为什么存在 因为C 没有自动回收内存的机制 因此每一次new出来的动态内存必须手动delete回去 因此智能指针可以解决这个问题 2 智能指针的大致描述是什么 智能指针 自动负责释放所指向的对象 实际上它利用了栈的机制 每一个智
  • Go语言基础知识4——依赖管理

    依赖 别人写的库 依赖其进行编译 依赖管理的三个阶段 GOPATH GOVENDOR go mod GOPATH和GOVENDOR正在向go mod迁移 一 GOPATH GOPATH是一个环境 就是一个目录 默认在 go unix lin
  • spring-boot集成spring-brick实现动态插件

    spring boot集成spring brick实现动态插件 spring boot集成spring brick实现动态插件 项目结构 需求实现 spring boot集成spring brick 环境说明 1 主程序集成spring b
  • GnosisSafeProxyFactory合约学习

    GnosisSafe是以太坊区块链上最流行的多签钱包 它的最初版本叫 MultiSigWallet 现在新的钱包叫Gnosis Safe 意味着它不仅仅是钱包了 它自己的介绍为 以太坊上的最可信的数字资产管理平台 The most trus
  • UTXO的定义(交易,输入输出)-1

    在比特币系统中并没有账户的概念 有的是遍布全网区块链的UTXO 所谓UTXO是指关联比特币地址的比特币金额的集合 是一个包含数据和可执行代码的数据结构 一个UTXO的基本单位是 聪 聪 是比特币的最小计量单位 一个比特币等于10 8聪 一个
  • Go标准库日志打印,以及同时输出到控制台和文件

    打印 在使用go写一些小程序时 我们没必要引入额外的包 直接使用fmt标准包打印即可 import fmt func main fmt Println line1 fmt Print line2 fmt Printf line d n 3
  • PTA 剥洋葱(C语言 + 详细注释 + 代码超简单)

    输入格式 一行 一个整数 即图形的层数 输出格式 如上述图形 输入样例 3 输出样例 AAAAA ABBBA ABCBA ABBBA AAAAA 打印图形题关键是找规律 一般只需两重循环 行循环 列循环 include
  • keras_contrib 安装(各种尝试详细过程)

    问题描述 from keras contrib layers import CRF 指示报错没有安装此模块 但直接查找模块找不到 百度的解决方法 第一次尝试 pip install git https www github com kera
  • Git学习笔记(Ubuntu)Git分支

    分支简介 分支创建 git branch xx 查看 git log oneline decorate 分支切换 git checkout xx 查看分叉历史 git log oneline decorate graph all 分支新建与
  • npm -v 报错的解决方案

    如果你尝试了 百度里 所有查到的方法 都没有用 可以试着 在安装完成后 进入 C Users Administrator config configstore 文件夹 清理 update notifier npm json 文件后 再试 也
  • check_hostname requires server_hostname

    代理服务器 Proxy Server 的功能是代理网络用户去取得网络信息 形象地说 它是网络信息的中转站 是个人网络和Internet服务商之间的中间代理机构 负责转发合法的网络信息 对转发进行控制和登记 vpn
  • myslq服务安装启动

    D Program Files x86 MySQL MySQL Server 5 5 bin gt mysqld exe install Service successfully installed D Program Files x86
  • 【MATLAB】三维图像绘制+图形对象句柄

    目录 一 三维曲面绘图 二 图形对象句柄 1 三位图形属性设置 2 动态图 用循环语句 pause 叠加画图 hold on hold on 的基础上刷新画图 set h xdata x ydata y 每次重置点 曲线或曲面 的横纵坐标
  • Could not read Username Error for Git

    在用使用Vundle安装neocomplete时失败 查看log提示could not read Username for https github com Device not configured 经检查这里提示的Username是指n

随机推荐

  • vue3中,this.$router.push刷新当前页面,该页面不重新加载需要手动刷新的解决方案

    1 demo vue 在this router push demo 后 demo vue没有刷新也没有重新加载 但是路由地址已经变了 只是需要手动刷新 2 解决 监听一下当前路由 watch route to from to表示要跳转的路由
  • 固定资产可视化智能管理系统

    什么是资产管理 企业一般有设备 工具 IT通讯 仪器 载具 文档 家具和一些其他的资产 比如某个制造企业现状 共计约有8000个资产分布存在于30个区域 包括实验室 办公室和工厂 其中每年新入资产预估为2000个 报废资产预计1800个 在
  • Python 3D函数图形投影到2D坐标轴上

    1 contourf函数命令 cmap matplotlib cm jet norm matplotlib colors Normalize vmin min np array sol vmax max np array sol plt c
  • 统计学习方法 第七章习题答案

    习题7 1 题目 比较感知机的对偶形式与线性可分支持向量机的对偶形式 解答 感知机 原始形式 min w b
  • linux:linux基础命令(一)

    前言 为什么要学linux 为了运维 项目上线 所以要了解linux操作系统 什么是LNMP linux nginx mysql php小常识 一个用Linux Shell编写的可以为CentOS RadHat Debian Ubuntu
  • 进程和子进程

    进程和子进程 父子进程有独立的数据段 堆 栈 共享代码段 Linux中每个进程都有4G的虚拟地址空间 独立的3G用户空间和共享的1G内核空间 fork 创建的子进程也不例外 子进程资源的由来 1G内核空间既然是所有进程共享 因此fork 创
  • input type=“file”上传文件(一)

    使用input标签 type file 的时候就可以上传文件 为input标签添加change事件 调用函数
  • qt 添加动作 (QAction)

    Qt 5 mainwindow h ifndef MAINWINDOW H define MAINWINDOW H include
  • 3Dgame_homework7

    3D游戏 作业7 智能巡逻兵 要求 游戏设计要求 程序设计要求 提示 相关理论 实现过程与代码 智能巡逻兵 要求 游戏设计要求 创建一个地图和若干巡逻兵 使用动画 每个巡逻兵走一个 3 5 个边的凸多边型 其位置数据是相对地址 即每次确定下
  • 静态库调用动态库或者静态库(Cmake例子)

    1 静态库无论调用动态库还是静态库都只需要include库的头文件 2 要在调用该静态库的地方添加库引用 并设置路径 结论 其实静态库调用动态库或者静态库 只是在用到库方法的地方把该方法添加到LIB当中 真正使用的地方才会把这些库LINK起
  • flutter_blue优化(FlutterBlue.instance.scan搜索重复、搜索结果处理、更新之前保存缓存数据、保存连接成功的设备)

    1 搜索列表优化 FlutterBlue instance scan搜索重复 搜索结果处理 更新之前保存缓存数据 2 保存连接过的设备 3 十进制转十六进制 4 写入十六进制数据 json scan dart 实体类 主要是使用flutte
  • Spring Boot是什么?它的优点是什么?

    Spring Boot是什么 它的优点是什么 Spring Boot是一个基于Spring框架的快速开发框架 它旨在简化Spring应用程序的开发过程和部署流程 Spring Boot提供了自动化配置和约定大于配置的方式 使开发人员可以专注
  • JavaWeb--- Filter(过滤器)学习

    Filter 过滤器 处理中文乱码 登录验证 1 xml依赖
  • 教你用Python做图像处理

    质量 速度 廉价 选择其中两个 提到图像处理第一个想到的库就是PIL 全称Python Imaging Library Python 图像处理类库 它提供了大量的图像操作 比如图像缩放 裁剪 贴图 模糊等等 很多时候它需要配合numpy库一
  • 无盘服务器怎么看使用情况,无盘服务器回写盘查看

    无盘服务器回写盘查看 内容精选 换一换 华为云帮助中心 为用户提供产品简介 价格说明 购买指南 用户指南 API参考 最佳实践 常见问题 视频帮助等技术文档 帮助您快速上手使用华为云服务 由于某些机型的服务器没有配备SDI卡 或者其他服务器
  • 代码随想录一刷-Day09

    LeetCode27 移除元素 public int removeElement int nums int val if nums length 0 return 0 双指针 int slow 0 fast 0 while fast lt
  • Dynamic 356 OP 9.0版本 同服务器环境还原DB导入组织报错

    环境 Dynamic 365 9 0 6 9 PRD和UAT组织在同一套服务器环境中 操作 1 备份PRD数据库 还原到UAT数据库 2 在CRM部署管理器中导入UAT组织数据库 提示错误如下 12 31 12 Verbose Databa
  • 代码简洁3 —— 注释

    前段时间在组织代码review时 提到代码可读性问题时 很多人的第一反应竟然是多添加注释 而我始终觉得注释只能是锦上添花 而不能雪中送炭 再多的注释也改变不了代码逻辑组织混乱的现实 反而过多的注释会加重代码阅读的时间 什么也比不上放置良好的
  • 大端模式(big-endian)与小端模式(little-endian)

    参考自 大端模式与小端模式 大端模式是指数据的低位保存在内存的高地址中 而数据的高位保存在内存的低地址中 小端模式是指数据的低位保存在内存的低地址中 而数据的高位保存在内存的高地址中 大小端模式的由来 在计算机系统中 我们是以字节为单位的
  • 【Caffe】官方例程之R-CNN物体检测

    R CNN is a state of the art detector that classifies region proposals by a finetuned Caffe model For the full details of