Chapter 5
Transfer Learning and Computer
Vision in PyTorch
Objectives
Learn PyTorch Basics
Extract features from pre-trained networks and train SVM classifiers using the features
Fine-tune the networks to classify on different datasets
5.1 PyTorch
PyTorch [23] is an open source machine learning library that is particularly useful for deep learning.
PyTorch contains auto-differentation, meaning that if we write code using PyTorch functions, we
can obtain the derivatives without any additional derivation or code. This saves us from having to
implement any backwards functions for deep networks. Secondly, PyTorch comes with many func-
tions and classes for common deep learning layers, optimizers, and loss functions. Lastly, PyTorch
is able to efficiently run computations on either the CPU or GPU.
5.1.1 Installation
To install PyTorch run the following commands in the Linux terminal:
pip i n s t a l l ht tps :/ / download . pytorch . org /whl/cpu / torch 1 . 0 . 1 . post2
cp27cp27mul i n ux x 8 6 6 4 . whl
pip i n s t a l l t o r c h v i s i o n
The first command installs the basic PyTorch package, and the second installs torchvision
which includes functions, classes, and models useful for deep learning for computer vision.
5.1.2 Tutorial
To start we ask you to complete the following tutorial: http://pytorch.org/tutorials/beginner/
pytorch_with_examples.html
59
CHAPTER 5. TRANSFER LEARNING AND COMPUTER VISION IN PYTORCH
5.2 CIFAR100 Example in PyTorch
Next as an example, we will re-implement the neural network from Chapter 4 using PyTorch instead
of the library we built. The code for this example is in the included cifar pytorch.py file.
PyTorch has a module called nn that contains implementations of the most common layers used
for neural networks. If you look at the code for some layer (for example linear.py), you will see
that it is similar to our implementation from Chapter 3. There is a forward method and an init
method. There is no need for a backward method because PyTorch uses automatic differentiation.
In this example, we will implement our model as a class with forward, init , fit and predict
functions. The initialization function simply sets up our layers using the layer types in the nn package.
In the forward pass we pass the data through our layers and return the output. Note that we can
reuse the pool layer, since this layer has no learnable parameters. Also instead of a ReLU layer, we
can use the F.relu function in our forward pass. Similarly, instead of a “flatten” layer, we can just
use PyTorch’s view to reshape the data.
c l a s s Net ( nn . Module ) :
def i n i t ( s e l f ) :
su per ( Net , s e l f ) . i n i t ( )
# Conv2d ar gs : ( inpu t depth , output depth , f i l t e r s i z e )
s e l f . conv1 = nn . Conv2d (3 , 16 , 3)
s e l f . conv2 = nn . Conv2d (1 6 , 32 , 3)
# Li ne ar a rg s : (# o f i np ut u nits , # o f output u n i t s )
s e l f . f c 1 = nn . Linear (1152 , 5)
# MaxPool2d a rgs : ( ke r n e l s i z e , s t r i d e )
s e l f . p oo l = nn . MaxPool2d (2 , 2)
def forwar d ( s e l f , x ) :
x = s e l f . pool (F . r e l u ( s e l f . conv1 ( x ) ) )
x = s e l f . pool (F . r e l u ( s e l f . conv2 ( x ) ) )
x = x . view ( 1 , 1152) # 1 means shape i s i n f e r r e d
x = s e l f . f c 1 ( x )
r et u rn x
Fit function Next we will implement a fit function. Here we implement the fit function as a
class method, similar to Chapter 3. It is also common to see the code for training be implemented
outside of the model class in a separate function. The fit function is very similar to our own fit
function from Chapter 3. First we set an optimization criterion, and an optimizer. Next we loop for
a number of epochs. In each epoch, we use PyTorch’s DataLoader class to loop through the data
in batches. The DataLoader automatically takes care of splitting the data into batches. We access
the batches with a simple for loop through the DataLoader. For each batch we zero the previously
calculated gradients using the optimizers zero grad method. Then we call the forward function,
compute the loss, call the backwards function, and perform one optimization step. The optimization
step is similar to the update params methods we used in previous chapters.
60
CHAPTER 5. TRANSFER LEARNING AND COMPUTER VISION IN PYTORCH
def f i t ( s e l f , t r a i n l o a d e r ) :
# sw it ch to t r a i n mode
s e l f . t r a i n ( )
# d e f i n e l o s s f u n c t ion
c r i t e r i o n = nn . CrossEntropyLoss ( )
# set up SGD
o pti m ize r = optim .SGD( s e l f . pa rameter s ( ) , l r =0.1 , momentum=0.0)
f o r epoch i n range ( 2 0) : # loop over the d a tas e t m u l tip l e
ti mes
r u n n i n g l o s s = 0. 0
f o r i , data i n enumerate ( t r a i n l o a d e r , 0) :
# ge t the inp uts
in put s , l a b e l s = data
# zer o the parameter g r a d i e n t s
o pti m ize r . z ero gr a d ( )
# compute fo rward pa ss
outpu ts = s e l f . for ward ( i np u ts )
# ge t l o s s f u n c t ion
l o s s = c r i t e r i o n ( outputs , l a b e l s )
# do backward p ass
l o s s . backward ( )
# do one g r adi e n t s tep
o pti m ize r . s t ep ( )
# p r i n t s t a t i s t i c s
r u n n i n g l o s s += l o s s . item ( )
p r i n t ( ’ [ Epoch : %d ] l o s s : %.3 f %
( epoch + 1 , r u n n i n g l o s s / ( i +1) ) )
Finally it is also useful to provide a predict function to run our model on some test data. This
predict function will also use the PyTorch loader.
61
CHAPTER 5. TRANSFER LEARNING AND COMPUTER VISION IN PYTORCH
def p r e d i c t ( s e l f , t e s t l o a d e r ) :
# sw it ch to e v alu a te mode
s e l f . e val ( )
c o r r e c t = 0
t o t a l = 0
a l l p r e d i c t e d = [ ]
with tor ch . no grad ( ) :
f o r images , l a b e l s i n t e s t l o a d e r :
outpu ts = s e l f . for ward ( Var iab le ( images ) )
, pred i c t e d = to rc h . max( o utp uts . data , 1)
t o t a l += l a b e l s . s i z e ( 0 )
c o r r e c t += ( pre d i c t ed == l a b e l s ) . sum ( ) . item ( )
a l l p r e d i c t e d += p r edic t e d . numpy() . t o l i s t ( )
p r i n t ( Accuracy on t e s t images : %d %% % (
100 c o r r e c t / t o t a l ) )
r et u rn a l l p r e d i c t e d
Notice that we use torch.nn.module.train() for training and torch.nn.module.eval() for
prediction. Function train() sets the module in training mode and function eval() sets the module
in evaluation mode. These two functions have an effect only on certain modules. They control certain
layers like Dropout and BatchNorm to be developed during training and disabled during evaluation.
For simple neural network structures like in Chapter 3 and Chapter 4, we do not have any layers
that will be affected by self.train() or self.eval(); so it is okay for you to delete them. But it
is a good habit to always add them when implementing a neural network.
We also use torch.no grad() when doing prediction. This function was introduced starting with
PyTorch version 0.4. It disables the gradient calculation. You can use it when you are sure that
you will not call the backward function. This will reduce memory consumption and computations
consumption.
To evaluate the two models we look at both the final classification accuracy as well as the confu-
sion matrix. The rows of the confusion matrix show the predicted class and the columns show the
actual class. This lets us analyze the patterns of missclassifications. The included util.py includes
the function plot confus matrix which will plot this matrix. Figure 5.1 shows the confusion matrix
for this CIFAR100 example.
5.3 Transfer Learning
Transfer learning is when we use a model trained on one set of data, and adapt it to another set
of data. For image datasets, transfer learning works because many features (e.g. edges) are useful
across different image datasets. Transfer learning using neural networks trained on large image
datasets is highly successful, and can easily be competitive with other approaches that do not use
deep learning. Transfer learning also does not require a huge amount of data, since the pre-trained
initialization is a good starting point.
In this chapter, we will consider two types of transfer learning: a feature-extraction based method
and a fine-tuning based method. We will be using networks that have been pre-trained on the
ImageNet dataset [24], and adapt them for different datasets.
62
CHAPTER 5. TRANSFER LEARNING AND COMPUTER VISION IN PYTORCH
Figure 5.1: CIFAR100 confusion matrix.
5.3.1 Datasets
We have collected five datasets for use in this chapter (Table 5.1). You will be assigned one of these
datasets to work on. The datasets are subsets of existing datasets. Figure 5.2 shows example images
from these datasets. You will not get credit if you download and submit a model trained on the full
version of these datasets. We want to fine-tune models that were originally trained on the ImageNet
dataset.
With transfer learning we alleviate some of the problems with using small datasets. Typically
if we tried to train a network from scratch on a small dataset, we might experience overfitting
problems. But in transfer learning, we start with some network trained on a much larger dataset.
Because of this, the features from the pre-trained network are not likely to overfit our data, yet still
likely to be useful for classification.
Table 5.1: Datasets for Transfer Learning in PyTorch
Dataset Description # Categories # Train / # Test
Animals iNat2017 challenge [25] 9 1384 / 466
Faces Labeled faces in the wild dataset [26] 31 1021 / 439
Places Places dataset [27] 9 1437 / 367
Household iMaterialist 2018 challenge [28] 9 1383 / 375
Caltech101 CVPR 2004 Workshop [29] 30 1500 / 450
5.3.2 Base Networks
Table 5.2 shows the base networks that we will consider in this chapter. All of these networks have
PyTorch versions trained on the ImageNet dataset [24], but the architectures of the networks vary.
The input size and the last layer input size also vary among the networks.
As an example we will be using DenseNet [30] to explain how to do fine-tuning in PyTorch. In
PyTorch we can load a pre-trained DenseNet model with the command:
63
CHAPTER 5. TRANSFER LEARNING AND COMPUTER VISION IN PYTORCH
(a) Household
(b) Animals
(c) Places
(d) Faces
(e) Caltech101
Figure 5.2: Example images from datasets for transfer learning in PyTorch.
64
CHAPTER 5. TRANSFER LEARNING AND COMPUTER VISION IN PYTORCH
import t o r c h v i s i o n . models
model = t o r c h v i s i o n . models . de nsenet12 1 ( p r e trai n e d=True )
It is important to use the pretrained=True argument. Otherwise, the model will be initialized
with random weights.
Table 5.2: Base Networks for Transfer Learning in PyTorch
Model Year Input Size Last layer input size PyTorch model
AlexNet [20] 2012 224 × 224 4096 torchvision.models.alexnet
VGG16 [22] 2014 224 × 224 4096 torchvision.models.vgg16
ResNet18 [18] 2016 224 × 224 512 torchvision.models.resnet18
Inception v3 [31] 2015 299 × 299 2048 torchvision.models.inception v3
DenseNet121 [30] 2017 224 × 224 1024 torchvision.models.densenet121
5.3.3 Pre-processing
It is important that we pre-process the images before sending them to the network. Most DNNs
preprocess the images to be zero mean and unit standard deviation before training. During testing
if we pass an image that is also not normalized (with respect to the training data), then the net-
work output won’t be useful. We can say that the network “expects” the input to be normalized.
PyTorch includes a transform module that implements the common transformations, including
normalization, used in pre-processing:
import t o r c h v i s i o n . tra nsf or m s as t r an sfo rm s
For normalization we can utilize the built in PyTorch function Normalize. The values used for
normalization can be computed from the images in the ImageNet dataset. For each channel in the
image there is a separate mean and standard deviation used for normalization.
normali ze = t ra nsf orm s . Normalize ( mean =[ 0.485 , 0 . 4 56 , 0 . 4 0 6 ] ,
st d =[ 0.2 29 , 0 . 2 24 , 0 . 2 2 5 ] )
To apply our data with our pretrained neural network, we must resize the images to the input
size expected by the network. For Alexnet, VGG16, ResNet18, and DenseNet this is 224 × 224
pixels, but for Inception this is 299 ×299 pixels. There are different ways to ensure our input is the
correct size. The simplest way is to resize our input to the correct size. This has the disadvantage
of potentially stretching the image if there is an aspect ratio mismatch. An alternative method is
to resize the image so that the minimum side length is equal to the required size, and then crop out
the extra part of the image to get the desired size. For this tutorial, we use the first method with
PyTorch’s Resize method:
r e s i z e = t ran sfo rm s . R esi ze ( (22 4 , 22 4) )
In preprocessing we would like to apply these transformations in a pipeline to every image.
PyTorch includes a useful function called Compose to combine the transformations into a single
object representing the pipeline:
65
CHAPTER 5. TRANSFER LEARNING AND COMPUTER VISION IN PYTORCH
p r e p r ocess o r = t r an sfo rms . Compose ( [
r e s i z e ,
tra nsf or ms . ToTensor ( ) ,
normalize ,
] )
Note the presence of ToTensor() which is needed to convert from the image representation to
the PyTorch tensor representation. We can easily add other transformations to this preprocessor
object to do more preprocessing, or to add data augmentation.
5.3.4 Feature Extractor + SVM
First we will use the base network as a feature extractor. This means that we simply run the images
through the pre-trained base network, and take outputs from layer(s) as a feature representation
of the image. These features are usually good for classification with a shallow machine learning
algorithm. In this case we will use an SVM for classification.
To extract features we need to stop the forward pass after a certain layer. As of writing this
tutorial, PyTorch doesn’t offer a simple call to extract a certain layer. However it is very easy
to change the model to give the output of a certain layer. How exactly to extract a certain layer
depends on the implementation of the model. We recommend you look at the source code for the
networks available at https://github.com/pytorch/vision/tree/master/torchvision/models.
The PyTorch AlexNet and VGG models are split into a feature extractor stage, and a classifier
stage. The feature extractor consists of the convolutional layers, and the classifier consists of the
fully connected layers. As an example, we want to extract the output of the layer before the last
fully connected layer. The easiest way to do this is to modify the sequential part of the model to
remove the second to last layer.
n e w c l a s s i f i e r = nn . S e q uent i a l ( l i s t ( model . c l a s s i f i e r . c h i l d r e n ( ) )
[ : 1 ])
model . c l a s s i f i e r = n e w c l a s s i f i e r
If we are using a model that does not use the feature extractor/classifier decomposition, we need
to modify the forward pass of the model to only compute up to the requested layer. For example, for
DenseNet, we can comment out the last layer to extract the features before the final fully connected
layer.
def forwar d ( s e l f , x ) :
f e a t u r e s = s e l f . f e a t u r e s (x)
out = F . r e l u ( f e a t u r e s , i n p l a c e=True )
# avera ge p oo l f e a t u r e s and re sh ape t o ( batch s i z e , f e a t u r e s i z e )
out = F . avg p ool2d ( out , k e r n e l s i z e =7, s t r i d e =1) . view ( f e a t u r e s .
s i z e ( 0) , 1)
# out = s e l f . c l a s s i f i e r ( out ) # commented out to g et the f e a t u r e s
i n stea d
r et u rn out
Next we need some way to load the data from our dataset. Again PyTorch provides some
convenient tools to do this. We will use the datasets.ImageFolder class to load our dataset. The
ImageFolder loader assumes a file structure of our data where each of the classes is stored in a
separate folder. Next we will use the torch.utils.data.DataLoader class to create a loader that
66
CHAPTER 5. TRANSFER LEARNING AND COMPUTER VISION IN PYTORCH
can be used to loop through our data with some batch size. We would need to have separate loaders
for the training data and the testing data. A loader can be constructed with:
l o ade r = t or ch . u t i l s . data . DataLoader (
d a t a sets . ImageFolder ( d ata d i r , p r e p r o c e s s o r ) ,
b a t c h s i z e=b a t c h s i z e ,
s h u f f l e=True )
With the loader set, we can now loop through the data and extract features. During looping we
can use Python’s enumerate function to keep track of the batch index. The loader returns a tuple
with the data and the target label.
In the case of testing, we don’t need to feed the label to the network, but it would be useful to
save the label so that we can use it later for computing the SVM classification accuracy for the test
set. To use the input data in our PyTorch model, we need to wrap it as a PyTorch Variable.
f o r i , ( i n da ta , t a r g e t ) i n enumerate ( l o a der ) :
i npu t v a r = t or ch . autograd . V ar iab le ( i n d ata , v o l a t i l e=True )
output = model ( in p ut var )
With the extracted features for each sample, we use Sci-kit learn’s SVM model. To review how
to use the model, revisit Chapter 2. It is up to you to determine the type of SVM and best hyper
parameters.
We can assess the accuracy of the SVM model using the classification accuracy on the entire
testing set. In addition to this, we can plot a confusion matrix, which tells more information about
the errors that the model made. util.py includes the function plot confus matrix which will plot
this matrix.
Important:
The plot confus matrix function plots a confusion matrix. Note that there is an input
parameter size to indicate how many classes you want to show in the plot. The default
value for size is None in which case it will plot the whole confusion matrix. If size is given
some integer value, for example 9, it will plot the confusion matrix from class 0 to class 8.
To make the result more clear to observe for large datasets, such as Faces and Caltech101,
please plot the confusion matrix for the first 9 classes. For the other three datasets, use the
default setting to plot the whole confusion matrix.
5.3.5 Fine-tuning
The feature extractor + SVM approach already may give decent results for your problem. This is
because, for certain problems, the intermediate representations learned by the pre-trained network
can be very useful for the new problem. However, even if the pre-trained filters are giving good
performance, we may be able to achieve even greater performance by allowing the parameters of the
pre-trained model to adapt to our new dataset. This adaptation process is called fine-tuning.
Performing fine-tuning is exactly the same process as performing training. Refer to the included
cifar pytorch.py to see how to train in PyTorch. The main differences in this case is that we want
to start from the pre-trained model. We can use the same DataLoader and transform modules
that we used for feature extraction. During fine-tuning we will usually use a very small learning
rate. We want to adapt the existing filters to our data, but not move the parameters so far from
the pre-trained parameters.
67
CHAPTER 5. TRANSFER LEARNING AND COMPUTER VISION IN PYTORCH
During fine-tuning we can speed up the process by running the model on the GPU. In PyTorch
this can be accomplished by using the .cuda() command on the model and loss function. For
example:
model = model . cuda ( )
c r i t e r i o n = c r i t e r i o n . cuda ( )
Finally, after training, we would like to save the model so that we can use it in the future without
training again. We can use the model.save(filename) function to save the model.
For the hands-on exercises in this chapter, we do not specify the hyper parameters for you to
use. It is up to you to choose a good set of hyper parameters. Examples of hyper parameters that
can be changed are learning rate, batch size, and number of epochs.
Data Augmentation To achieve higher performance, we can experiment with data augmenta-
tions. Data augmentation is the process of slightly perturbing the input images to generate more
samples than were originally available. This data augmentation could be image rotation, scale, gray
scale transformation, etc.
PyTorch includes transformations useful for data augmentation in the torchvision.transforms
module. As part of the deliverables, you need to add some of these data augmentation methods to
your preprocessing object to achieve greater test accuracy.
Deliverable: Transfer Learning with PyTorch
Depending on your task number, you are assigned a different dataset and model to perform
transfer learning (Table 5.3).
Provide the test accuracy and confusion matrices for the considered networks as feature
extractors for the dataset; name the confusion matrix plot as conf feature.png
Provide the test accuracy and confusion matrices for the considered fine-tuned network
for the dataset; name the confusion matrix plot as conf finetune.png
Submit the trained model saved using the model.save function
Submit code for both the feature extraction based method and the fine-tuning based
method named as feature.py and finetune.py
Submit the code for the fine-tuning based method with data argumentation, as well as
the saved trained model
Additional Resources
PyTorch Documentation - http://pytorch.org/docs/stable/index.html
TorchVision Documentation - http://pytorch.org/docs/master/torchvision/index.html
PyTorch tutorials - http://pytorch.org/tutorials/
PyTorch Github - https://github.com/pytorch/pytorch
TorchVision Github - https://github.com/pytorch/vision
PyTorch Discussion Forum - https://discuss.pytorch.org/
68
CHAPTER 5. TRANSFER LEARNING AND COMPUTER VISION IN PYTORCH
Task Assignment
Table 5.3: Sample tasks for transfer learning in PyTorch
Task number Dataset Model
1 Animals AlexNet
2 Animals Inception
3 Animals ResNet18
4 Animals VGG16
5 Places AlexNet
6 Places Inception
7 Places ResNet18
8 Places VGG16
9 Faces AlexNet
10 Faces Inception
11 Faces ResNet18
12 Faces VGG16
13 Household AlexNet
14 Household Inception
15 Household ResNet18
16 Household VGG16
17 Caltech101 AlexNet
18 Caltech101 Inception
19 Caltech101 ResNet18
20 Caltech101 VGG16
Submission Instructions
Submit your work as a folder named FIRSTNAME LASTNAME TASKNUMBER CH5 and zip the folder for
submission. The plots and saved models that you generated while performing the hands-on assign-
ments in this chapter should be placed in a folder called results. The grading rubric is shown
in Table 5.4. There are no unit tests for the submitted code, but your code will be graded by
attempting to run it.
Table 5.4: Grading rubric
Points Description
NOTE: DO NOT SUBMIT THE DATASET
35 Working code for feature extraction based method (feature.py)
35 Working code for fine-tuning based method (finetune.py)
10 Fine-tuning code using data augmentation
10 Test accuracies and confusion matrices for the assigned dataset
and model
10 Submit saved models (two in total)
Total 100
69
Bibliography
[1] “Numpy reference.” https://docs.scipy.org/doc/numpy-1.13.0/reference/, 2017.
[2] “Scipy reference.” https://docs.scipy.org/doc/scipy-1.0.0/reference/, 2017.
[3] “Matplotlib pyplot reference.” https://matplotlib.org/api/pyplot_api.html, 2017.
[4] J. Johnson, “Python numpy tutorial.” http://cs231n.github.io/python-numpy-tutorial/,
2017.
[5] “Luma coding in video systems.” https://en.wikipedia.org/wiki/Grayscale#Luma_
coding_in_video_systems, 2017.
[6] Wikipedia, “DFT matrix — Wikipedia, the free encyclopedia.” http://en.wikipedia.org/w/
index.php?title=DFT%20matrix&oldid=811427639, 2017.
[7] J. L. Bentley, “Multidimensional binary search trees used for associative searching,” Commu-
nications of the ACM, vol. 18, no. 9, pp. 509–517, 1975.
[8] D. R. Cox, “The regression analysis of binary sequences,” Journal of the Royal Statistical
Society. Series B (Methodological), pp. 215–242, 1958.
[9] M. Aly, “Survey on multiclass classification methods,” Neural Networks, 2005.
[10] T. P. Minka, “A comparison of numerical optimizers for logistic regression,” 2003.
[11] C. Cortes and V. Vapnik, “Support-vector networks,” Machine learning, vol. 20, no. 3, pp. 273–
297, 1995.
[12] C.-C. Chang and C.-J. Lin, “LIBSVM: A library for support vector machines,” ACM Transac-
tions on Intelligent Systems and Technology, vol. 2, pp. 27:1–27:27, 2011. Software available at
http://www.csie.ntu.edu.tw/
~
cjlin/libsvm.
[13] H. Yu and S. Kim, “SVM tutorial: classification, regression and ranking,” in Handbook of
Natural computing, pp. 479–506, Springer, 2012.
[14] I. Goodfellow, Y. Bengio, and A. Courville, Deep Learning. MIT Press, 2016. http://www.
deeplearningbook.org.
[15] J. Johnson, “Backpropagation for a Linear Layer.” http://cs231n.stanford.edu/handouts/
linear-backprop.pdf. [Online; accessed 22-Dec-2017].
[16] X. Glorot and Y. Bengio, “Understanding the difficulty of training deep feedforward neural
networks,” in Proceedings of the Thirteenth International Conference on Artificial Intelligence
and Statistics, pp. 249–256, 2010.
70
BIBLIOGRAPHY
[17] V. Nair and G. E. Hinton, “Rectified linear units improve restricted boltzmann machines,” in
Proceedings of the 27th international conference on machine learning (ICML-10), pp. 807–814,
2010.
[18] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Pro-
ceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778,
2016.
[19] A. Krizhevsky, “Learning multiple layers of features from tiny images,” Technical Report, 2009.
[20] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional
neural networks,” in Advances in neural information processing systems, pp. 1097–1105, 2012.
[21] A. Lavin and S. Gray, “Fast algorithms for convolutional neural networks,” in Proceedings of
the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4013–4021, 2016.
[22] K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recog-
nition,” arXiv preprint arXiv:1409.1556, 2014.
[23] A. Paszke, S. Gross, S. Chintala, G. Chanan, E. Yang, Z. DeVito, Z. Lin, A. Desmaison,
L. Antiga, and A. Lerer, “Automatic differentiation in pytorch,” 2017.
[24] O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy,
A. Khosla, M. Bernstein, et al., “Imagenet large scale visual recognition challenge,” Interna-
tional Journal of Computer Vision, vol. 115, no. 3, pp. 211–252, 2015.
[25] “iNaturalist challenge at FGVC 2017.” https://www.kaggle.com/c/
inaturalist-challenge-at-fgvc-2017. Accessed: 2018-04-11.
[26] E. Learned-Miller, G. B. Huang, A. RoyChowdhury, H. Li, and G. Hua, “Labeled faces in the
wild: A survey,” in Advances in face detection and facial image analysis, pp. 189–248, Springer,
2016.
[27] B. Zhou, A. Lapedriza, J. Xiao, A. Torralba, and A. Oliva, “Learning deep features for scene
recognition using places database,” in Advances in neural information processing systems,
pp. 487–495, 2014.
[28] “iMaterialist challenge at FGVC 2018.” https://www.kaggle.com/c/
imaterialist-challenge-furniture-2018. Accessed: 2018-04-11.
[29] L. Fei-Fei, R. Fergus, and P. Perona, “Learning generative visual models from few training
examples: An incremental bayesian approach tested on 101 object categories,” Computer vision
and Image understanding, vol. 106, no. 1, pp. 59–70, 2007.
[30] G. Huang, Z. Liu, L. van der Maaten, and K. Q. Weinberger, “Densely connected convolutional
networks,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition,
2017.
[31] C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and
A. Rabinovich, “Going deeper with convolutions,” IEEE International Conference on Computer
Vision (CVPR), pp. 1–9, 2015.
71