Awesome Fine-Grained Image Analysis – Papers, Codes and Datasets
Table of contents
-
Introduction
-
Tutorials
-
Survey papers
-
Benchmark datasets
-
Fine-grained image recognition
-
Fine-grained recognition by localization-classification subnetworks
-
Employing detection or segmentation techniques
-
Utilizing deep filters / activations
-
Leveraging attention mechanisms
-
Other methods
-
Fine-grained recognition by end-to-end feature encoding
-
High-order feature interactions
-
Specific loss functions
-
Other methods
-
Fine-grained recognition with external information
-
Fine-grained recognition with web data / auxiliary data
-
Fine-grained recognition with multi-modality data
-
Fine-grained recognition with humans in the loop
-
Fine-grained image retrieval
-
Content-based fine-grained image retrieval
-
Sketch-based fine-grained image retrieval
-
Future directions of FGIA
-
Fine-grained few shot learning
-
Fine-grained hashing
-
Fine-grained domain adaptation
-
Fine-grained image generation
-
FGIA within more realistic settings
-
Recognition leaderboard
Introduction
This homepage lists some representative papers/codes/datasets all about deep learning based fine-grained image analysis, including fine-grained image recognition, fine-grained image retrieval, etc. If you have any questions, please feel free to contact Prof. Xiu-Shen Wei.
Tutorials
Survey papers
Benchmark datasets
Summary of popular fine-grained image datasets. Note that ‘‘BBox’’ indicates whether this dataset provides object bounding box supervisions. ‘‘Part anno.’’ means providing the key part localizations. ‘‘HRCHY’’ corresponds to hierarchical labels. ‘‘ATR’’ represents the attribute labels (e.g., wing color, male, female, etc). ‘‘Texts’’ indicates whether fine-grained text descriptions of images are supplied.
Dataset name |
Year |
Meta-class |
images |
categories |
BBox |
Part anno. |
HRCHY |
ATR |
Texts |
Oxford flower |
2008 |
Flowers |
8,189 |
102 |
|
|
|
|
|
CUB200 |
2011 |
Birds |
11,788 |
200 |
|
|
|
|
|
Stanford Dog |
2011 |
Dogs |
20,580 |
120 |
|
|
|
|
|
Stanford Car |
2013 |
Cars |
16,185 |
196 |
|
|
|
|
|
FGVC Aircraft |
2013 |
Aircrafts |
10,000 |
100 |
|
|
|
|
|
Birdsnap |
2014 |
Birds |
49,829 |
500 |
|
|
|
|
|
NABirds |
2015 |
Birds |
48,562 |
555 |
|
|
|
|
|
DeepFashion |
2016 |
Clothes |
800,000 |
1,050 |
|
|
|
|
|
Fru92 |
2017 |
Fruits |
69,614 |
92 |
|
|
|
|
|
Veg200 |
2017 |
Vegetable |
91,117 |
200 |
|
|
|
|
|
iNat2017 |
2017 |
Plants & Animals |
859,000 |
5,089 |
|
|
|
|
|
RPC |
2019 |
Retail products |
83,739 |
200 |
|
|
|
|
|
Fine-grained image recognition
Fine-grained recognition by localization-classification subnetworks
Employing detection or segmentation techniques
Utilizing deep filters / activations
Leveraging attention mechanisms
Other methods
Fine-grained recognition by end-to-end feature encoding
High-order feature interactions
Specific loss functions
Other methods
Fine-grained recognition with external information
Fine-grained recognition with web data
Fine-grained recognition with multi-modality data
Fine-grained recognition with humans in the loop
Fine-grained image retrieval
Content-based fine-grained image retrieval
Sketch-based fine-grained image retrieval
-
Sketch Me That Shoe.
Qian Yu, Feng Liu, Yi-Zhe Song, Tao Xiang, Timothy M. Hospedales, and Chen Change Loy. CVPR, 2016.
Future directions of FGIA
Fine-grained few shot learning
Fine-grained hashing
Fine-grained domain adaptation
Fine-grained image generation
FGIA within more realistic settings
Recognition leaderboard
The section is being continually updated. Since CUB200-2011 is the most popularly used fine-grained dataset, we list the fine-grained recognition leaderboard by treating it as the test bed.
Method |
Published |
BBox? |
Part? |
External information?
|
Base model |
Image resolution |
Accuracy |
PB R-CNN |
ECCV 2014 |
|
|
|
Alex-Net |
224x224 |
73.9% |
MaxEnt |
NeurIPS 2018 |
|
|
|
GoogLeNet |
TBD |
74.4% |
PB R-CNN |
ECCV 2014 |
|
|
|
Alex-Net |
224x224 |
76.4% |
PS-CNN |
CVPR 2016 |
|
|
|
CaffeNet |
454x454 |
76.6% |
MaxEnt |
NeurIPS 2018 |
|
|
|
VGG-16 |
TBD |
77.0% |
Mask-CNN |
PR 2018 |
|
|
|
Alex-Net |
448x448 |
78.6% |
PC |
ECCV 2018 |
|
|
|
ResNet-50 |
TBD |
80.2% |
DeepLAC |
CVPR 2015 |
|
|
|
Alex-Net |
227x227 |
80.3% |
MaxEnt |
NeurIPS 2018 |
|
|
|
ResNet-50 |
TBD |
80.4% |
Triplet-A |
CVPR 2016 |
|
|
Manual labour |
GoogLeNet |
TBD |
80.7% |
Multi-grained |
ICCV 2015 |
|
|
WordNet etc. |
VGG-19 |
224x224 |
81.7% |
Krause et al. |
CVPR 2015 |
|
|
|
CaffeNet |
TBD |
82.0% |
Multi-grained |
ICCV 2015 |
|
|
WordNet etc. |
VGG-19 |
224x224 |
83.0% |
TS |
CVPR 2016 |
|
|
|
VGGD+VGGM |
448x448 |
84.0% |
Bilinear CNN |
ICCV 2015 |
|
|
|
VGGD+VGGM |
448x448 |
84.1% |
STN |
NeurIPS 2015 |
|
|
|
GoogLeNet+BN |
448x448 |
84.1% |
LRBP |
CVPR 2017 |
|
|
|
VGG-16 |
224x224 |
84.2% |
PDFS |
CVPR 2016 |
|
|
|
VGG-16 |
TBD |
84.5% |
Xu et al. |
ICCV 2015 |
|
|
Web data |
CaffeNet |
224x224 |
84.6% |
Cai et al. |
ICCV 2017 |
|
|
|
VGG-16 |
448x448 |
85.3% |
RA-CNN |
CVPR 2017 |
|
|
|
VGG-19 |
448x448 |
85.3% |
MaxEnt |
NeurIPS 2018 |
|
|
|
Bilinear CNN |
TBD |
85.3% |
PC |
ECCV 2018 |
|
|
|
Bilinear CNN |
TBD |
85.6% |
CVL |
CVPR 2017 |
|
|
Texts |
VGG |
TBD |
85.6% |
Mask-CNN |
PR 2018 |
|
|
|
VGG-16 |
448x448 |
85.7% |
GP-256 |
ECCV 2018 |
|
|
|
VGG-16 |
448x448 |
85.8% |
KP |
CVPR 2017 |
|
|
|
VGG-16 |
224x224 |
86.2% |
T-CNN |
IJCAI 2018 |
|
|
|
ResNet |
224x224 |
86.2% |
MA-CNN |
ICCV 2017 |
|
|
|
VGG-19 |
448x448 |
86.5% |
MaxEnt |
NeurIPS 2018 |
|
|
|
DenseNet-161 |
TBD |
86.5% |
DeepKSPD |
ECCV 2018 |
|
|
|
VGG-19 |
448x448 |
86.5% |
OSME+MAMC |
ECCV 2018 |
|
|
|
ResNet-101 |
448x448 |
86.5% |
StackDRL |
IJCAI 2018 |
|
|
|
VGG-19 |
224x224 |
86.6% |
DFL-CNN |
CVPR 2018 |
|
|
|
VGG-16 |
448x448 |
86.7% |
Bi-Modal PMA |
IEEE TIP 2020 |
|
|
|
VGG-16 |
448x448 |
86.8% |
PC |
ECCV 2018 |
|
|
|
DenseNet-161 |
TBD |
86.9% |
KERL |
IJCAI 2018 |
|
|
Attributes |
VGG-16 |
224x224 |
87.0% |
HBP |
ECCV 2018 |
|
|
|
VGG-16 |
448x448 |
87.1% |
Mask-CNN |
PR 2018 |
|
|
|
ResNet-50 |
448x448 |
87.3% |
DFL-CNN |
CVPR 2018 |
|
|
|
ResNet-50 |
448x448 |
87.4% |
NTS-Net |
ECCV 2018 |
|
|
|
ResNet-50 |
448x448 |
87.5% |
HSnet |
CVPR 2017 |
|
|
|
GoogLeNet+BN |
TBD |
87.5% |
Bi-Modal PMA |
IEEE TIP 2020 |
|
|
|
ResNet-50 |
448x448 |
87.5% |
CIN |
AAAI 2020 |
|
|
|
ResNet-50 |
448x448 |
87.5% |
MetaFGNet |
ECCV 2018 |
|
|
Auxiliary data |
ResNet-34 |
TBD |
87.6% |
Cross-X |
CVPR 2020 |
|
|
|
ResNet-50 |
448x448 |
87.7% |
DCL |
CVPR 2019 |
|
|
|
ResNet-50 |
448x448 |
87.8% |
ACNet |
CVPR 2020 |
|
|
|
VGG-16 |
448x448 |
87.8% |
TASN |
CVPR 2019 |
|
|
|
ResNet-50 |
448x448 |
87.9% |
ACNet |
CVPR 2020 |
|
|
|
ResNet-50 |
448x448 |
88.1% |
CIN |
AAAI 2020 |
|
|
|
ResNet-101 |
448x448 |
88.1% |
DBTNet-101 |
NeurIPS 2019 |
|
|
|
ResNet-101 |
448x448 |
88.1% |
Bi-Modal PMA |
IEEE TIP 2020 |
|
|
Texts |
VGG-16 |
448x448 |
88.2% |
GCL |
AAAI 2020 |
|
|
|
ResNet-50 |
448x448 |
88.3% |
S3N |
CVPR 2020 |
|
|
|
ResNet-50 |
448x448 |
88.5% |
Sun et al. |
AAAI 2020 |
|
|
|
ResNet-50 |
448x448 |
88.6% |
FDL |
AAAI 2020 |
|
|
|
ResNet-50 |
448x448 |
88.6% |
Bi-Modal PMA |
IEEE TIP 2020 |
|
|
Texts |
ResNet-50 |
448x448 |
88.7% |
DF-GMM |
CVPR 2020 |
|
|
|
ResNet-50 |
448x448 |
88.8% |
PMG |
ECCV 2020 |
|
|
|
VGG-16 |
550x550 |
88.8% |
FDL |
AAAI 2020 |
|
|
|
DenseNet-161 |
448x448 |
89.1% |
PMG |
ECCV 2020 |
|
|
|
ResNet-50 |
550x550 |
89.6% |
API-Net |
AAAI 2020 |
|
|
|
DenseNet-161 |
512x512 |
90.0% |
Ge et al. |
CVPR 2019 |
|
|
|
GoogLeNet+BN |
Shorter side is 800 px |
90.3% |