Differentiating Images of Dogs and Cats using a Convolutional Neural Network and Google Cloud's Deep Learning VM Image

Mark Dodd and Michael Ellsworth

title

Abstract

This project will explore methods for building a Convolutional Neural Network (CNN) to differentiate images between cats and dogs. This somewhat trivial task for humans is a complicated and time consuming process for a desktop computer. Exploiting the resources available on Google Cloud, this project will test a number of different CNNs in order to achieve a target accuracy of 90% or greater.

Packages

This project will lean heavily on the CNN infrastructure available open source via TensorFlow's highlevel API Keras. Keras gives us the ability to build and train deep learning models such as CNNs, which will ultimately be used to differentiate images of cats and dogs. In addition to Keras, the typical Data Science python stack will be used, including pandas, numpy and matplotlib.

In [2]:
import os
import glob
import pathlib
import itertools
import pickle
import time
import math
from PIL import Image
import multiprocessing as mp

import pandas as pd
import numpy as np
from statistics import mean, median, mode
import matplotlib as mpl
import matplotlib.pyplot as plt
import seaborn as sns

from sklearn.metrics import accuracy_score, precision_score, recall_score
from sklearn.metrics import confusion_matrix, roc_curve, precision_recall_curve

import tensorflow as tf
from tensorflow import keras
import tensorflow.keras.backend as K
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Conv2D, Flatten, Dropout, MaxPooling2D, BatchNormalization, Activation, GlobalAveragePooling2D
from tensorflow.keras.preprocessing.image import ImageDataGenerator
AUTOTUNE = tf.data.experimental.AUTOTUNE

%matplotlib inline
sns.set()
mpl.rcParams['figure.dpi'] = 100
mpl.rcParams['axes.titlesize'] = 18
mpl.rcParams['axes.labelsize'] = 14

#TRAIN_PATH = (r'/home/mark/Dodd/software/Python/data/uofc/data608/project/' 
#              + 'dogs-vs-cats/train/')
TRAIN_PATH = r'data/train/'

Data Set

Background

The dataset that will be used to train and test the CNNs in this project is publically available at Kaggle. As per the majority of kaggle datasets, it consists of a training set and a testing set, however, this project will train and test the CNNs via the training set only as the testing set is unlabelled and used for submitting the final model to Kaggle for judging. Therefore, the testing set is unable to serve the purpose of a typical machine learning testing set.

The size of the training data is approximately 850 MB, which, in our opinion, makes this sizeable enough to be constituted as a Big Data problem, especially considering that once the data is read into the memory the arrays for each image actually take up much more space than the compressed .jpg files do. This project will test that assumption by running a handful of CNNs on a local machine to differentiate the training duration between that of a cluster of machines at Google. This topic will be explored further in later sections of the project.

A Quick Exploration of the Data

We want to take a moment to illustrate the actual size of our data set and also the density of dimensions of the images. We will illustrate this with a histogram and density plot of the x and y dimensions. We use multiprocessing to perform this processing and cycle through 8 cores to ascertain the benefit of multiprocessing for this task.

In [3]:
def image_information(file):
    img = np.array(Image.open(file))
    return img.shape[0], img.shape[1], img.nbytes
In [10]:
results = {'cores': [], 'chunksize': [], 'time': [], 'size': []}
for cpu in range(1, mp.cpu_count() + 1):
    for cs in [1, 8, 16, 24, 32]:
        x_dims = []
        y_dims = []
        num_bytes = 0
        data_dir = pathlib.Path(TRAIN_PATH)
        files = data_dir.glob('*.jpg')

        begin = time.perf_counter()
        with mp.Pool(processes = cpu) as p:
            for x, y, n in p.imap_unordered(image_information, files, chunksize = cs):
                x_dims.append(x)
                y_dims.append(y)
                num_bytes += n
        end = time.perf_counter()

        r = (cs, end - begin)
        print('Number of processors in use is {} of {}. '.format(cpu, mp.cpu_count()) + 'chunksize = {}, time = {:.4f} s'.format(*r) )
        #print("cs = {}, {} s.".format(*r))
        results['cores'].append(cpu)
        results['chunksize'].append(cs)
        results['time'].append(r[1])
        results['size'].append(num_bytes)

        results_df = pd.DataFrame(results)
print("mode x = {}, mean x = {:.2f}, median x = {}".format(mode(x_dims), mean(x_dims), median(x_dims)))
print("mode y = {}, mean y = {:.2f}, median y = {}".format(mode(y_dims), mean(y_dims), median(y_dims)))
Number of processors in use is 1 of 8. chunksize = 1, time = 35.2806 s
Number of processors in use is 1 of 8. chunksize = 8, time = 33.4802 s
Number of processors in use is 1 of 8. chunksize = 16, time = 33.4808 s
Number of processors in use is 1 of 8. chunksize = 24, time = 33.4829 s
Number of processors in use is 1 of 8. chunksize = 32, time = 33.2797 s
Number of processors in use is 2 of 8. chunksize = 1, time = 18.1611 s
Number of processors in use is 2 of 8. chunksize = 8, time = 16.9562 s
Number of processors in use is 2 of 8. chunksize = 16, time = 16.8560 s
Number of processors in use is 2 of 8. chunksize = 24, time = 16.8528 s
Number of processors in use is 2 of 8. chunksize = 32, time = 16.8544 s
Number of processors in use is 3 of 8. chunksize = 1, time = 12.2567 s
Number of processors in use is 3 of 8. chunksize = 8, time = 11.5560 s
Number of processors in use is 3 of 8. chunksize = 16, time = 11.3529 s
Number of processors in use is 3 of 8. chunksize = 24, time = 11.4552 s
Number of processors in use is 3 of 8. chunksize = 32, time = 11.4517 s
Number of processors in use is 4 of 8. chunksize = 1, time = 9.5606 s
Number of processors in use is 4 of 8. chunksize = 8, time = 8.8560 s
Number of processors in use is 4 of 8. chunksize = 16, time = 8.7560 s
Number of processors in use is 4 of 8. chunksize = 24, time = 8.7566 s
Number of processors in use is 4 of 8. chunksize = 32, time = 8.6543 s
Number of processors in use is 5 of 8. chunksize = 1, time = 8.8663 s
Number of processors in use is 5 of 8. chunksize = 8, time = 8.2619 s
Number of processors in use is 5 of 8. chunksize = 16, time = 8.1637 s
Number of processors in use is 5 of 8. chunksize = 24, time = 8.1634 s
Number of processors in use is 5 of 8. chunksize = 32, time = 8.2635 s
Number of processors in use is 6 of 8. chunksize = 1, time = 8.3733 s
Number of processors in use is 6 of 8. chunksize = 8, time = 7.7717 s
Number of processors in use is 6 of 8. chunksize = 16, time = 7.6675 s
Number of processors in use is 6 of 8. chunksize = 24, time = 7.6691 s
Number of processors in use is 6 of 8. chunksize = 32, time = 7.7702 s
Number of processors in use is 7 of 8. chunksize = 1, time = 7.9810 s
Number of processors in use is 7 of 8. chunksize = 8, time = 7.3785 s
Number of processors in use is 7 of 8. chunksize = 16, time = 7.2767 s
Number of processors in use is 7 of 8. chunksize = 24, time = 7.2794 s
Number of processors in use is 7 of 8. chunksize = 32, time = 7.2747 s
Number of processors in use is 8 of 8. chunksize = 1, time = 7.6906 s
Number of processors in use is 8 of 8. chunksize = 8, time = 6.9827 s
Number of processors in use is 8 of 8. chunksize = 16, time = 6.9831 s
Number of processors in use is 8 of 8. chunksize = 24, time = 6.9817 s
Number of processors in use is 8 of 8. chunksize = 32, time = 6.8831 s
mode x = 374, mean x = 360.48, median x = 374.0
mode y = 500, mean y = 404.10, median y = 447.0
In [11]:
!du -sh data/train
595M	data/train
In [12]:
fig, ax = plt.subplots(figsize=(12,8))
sns.barplot(data=results_df, x='cores', y='time', hue='chunksize', ax=ax, palette=sns.color_palette("Blues_d"), edgecolor = 'k')
plt.title("Time to Process 25,000 Images vs. Core and Chunksize", size = 18)
plt.annotate("Files are 595MB on drive, {:.1f} GB in memory.\nReadings taken after files stored in disk cache.".format(num_bytes/(1024**3)), xy = (4,10))
plt.xlabel("Number of Cores", size = 14)
plt.ylabel("Time (s)")
plt.show()
In [7]:
fig, ax = plt.subplots(figsize=(12,8))
sns.distplot(x_dims, bins = 30, label = 'x-dim')
sns.distplot(y_dims, bins = 30, label = 'y-dim')
plt.xlabel('Size (pixels)', size = 14)
plt.title('Distribution of Image Sizes in Dog vs. Cats', size = 18)
plt.xlim((0,600))
plt.legend()
plt.show()

From the above plot we see that there is a large, left skewed, distribution of image sizes in our data set. We decided to rescale each image to 150x150 to reduce data size and model size with the idea to aid us in building a model.

Structuring the training data

The training data available at Kaggle consists of 25,000 images; 12,500 images of dogs and 12,500 of cats. Each image file is labeled either dog.n.jpg or cat.n.jpg with n numbered from 1 to 12,500. In order to effectively train a CNN, the 25,000 training images would need to be split into three image subsets; training, validation and testing. In this project, it was decided that in order to effectively train a CNN, an equal set of cat and dog images would be required in each of these image subsets. The following set of functions goes about splitting the training data into the previously mentioned image subsets with equal images of dogs and cats in each.

In [8]:
# Function 1 - train_val_test
# Creates a function to split a list into the three image subsets; training, validation and testing

def train_val_test(img_list, test_size = 0.1, validation_size = 0.15, random_state = 42):
    ''' Split a list into a training, validation and test set
        Parameters: 
            img_list - a list of img file paths
            test_size - 0-1.0 - defines the test size as percentage of the list size
            validation_size - 0-1.0 - defines the validation size as percentage of the list size
        |Returns:
            train_list
            validation_list
            test_list 
    '''
    np.random.seed(random_state)
    imgs_shuffled = img_list.copy()
    np.random.shuffle(imgs_shuffled)

    train_size = 1 - test_size - validation_size
    train_ind = int(len(imgs_shuffled)*train_size)
    val_ind = int(len(imgs_shuffled)*validation_size) + train_ind

    return imgs_shuffled[:train_ind], imgs_shuffled[train_ind:val_ind], imgs_shuffled[val_ind:]

# Function 2 - train_val_test_combined
# Creates a function that will combine the three image subsets for dogs and cats
# This function is required to reconstruct the 25,000 images with an equal amount of dogs and cats in each subset

def train_val_test_combined(dogs, cats, test_size = 0.1, validation_size = 0.15, random_state = 42):
    ''' Split a list into a training, validation and test set
        Parameters: 
            dogs - a list of img file paths
            cats - a list of img file paths
            test_size - 0-1.0 - defines the test size as percentage of the list size
            validation_size - 0-1.0 - defines the validation size as percentage of the list size
        Returns:
            train_list - combined dog and cat list
            validation_list - combined dog and cat list
            test_list - combined dog and cat list
    '''
    dog_train, dog_validation, dog_test = train_val_test(dogs, test_size, validation_size, random_state)
    cat_train, cat_validation, cat_test = train_val_test(cats, test_size, validation_size, random_state)

    train = dog_train + cat_train
    val = dog_validation + cat_validation
    test = dog_test + cat_test

    np.random.seed(random_state)
    np.random.shuffle(train)
    np.random.shuffle(val)
    np.random.shuffle(test)

    return train, val, test

# Function 3 - create_train_val_test
# Creates a function to separate the cats and dogs files and combine them into the three image subsets using
# Function 1 and Function 2

def create_train_val_test(train_path):
    #train_files = glob.glob(train_path + '*')
    train_cat_files = glob.glob(train_path + 'cat*')
    train_dog_files = glob.glob(train_path + 'dog*')

    return train_val_test_combined(train_dog_files, train_cat_files)

Data Cleaning

There are a number of data cleaning steps that are required prior to feeding the images into Keras to create a CNN. These data cleaning steps include:

  • Assigning a label to each image, in this case, 1 being a dog and 0 being a cat
  • Decoding the .jpg file into a 3D uint8 tensor that will assign 3 numbers to each of the images pixels based on the amount of red, green and blue color in each pixel
  • Converting the typical colour coding number format from a 0 to 255 scale to a 0 to 1 scale.

Additionally, an important issue to note about the dataset is that the images are inconsistent in size. Since Keras takes the pixel length and pixel width as features of the image, if the images are left as is, Keras would be unable to train a CNN since the features of each image would be inconsistent. This would be equivalent to creating a linear regression model where each observation has a different number of features. As a result, the images will need to be re-sized to a consistent pixel length and width. This project will resize the images to a 150 by 150 pixel length and width.

The following set of functions were built to complete the aforementioned data cleaning steps. The code was adapted from the following TensorFlow tutorial. After the functions are created, the three subsets of image data can be constructed.

In [9]:
# Constants used to define the image height and width to resize each image consistently
IMG_HEIGHT = 150
IMG_WIDTH = 150
BATCH_SIZE = 32

# Function 1 - get_label
# Creates a function to extract the label from each .jpg file in the Kaggle dataset

def get_label(file_path):
    # convert the path to a list of path components
    parts = tf.strings.split(file_path, os.path.sep)

    # get the cat / dog component of the file name
    cat_or_dog = tf.strings.split(parts[-1], '.')[0]

    return int(cat_or_dog == 'dog')

# Function 2 - decode_img
# Creates a function to decode the .jpg file into a 3D uint8 tensor with a consistent height and width

def decode_img(img):
    # convert the compressed string to a 3D uint8 tensor
    img = tf.image.decode_jpeg(img, channels=3)

    # Use `convert_image_dtype` to convert to floats in the [0,1] range.
    img = tf.image.convert_image_dtype(img, tf.float32)

    # resize the image to the desired size.
    return tf.image.resize(img, [IMG_HEIGHT, IMG_WIDTH])

# Function 3 - process_path
# Creates a function to extract the label using the get_label function and assign it to a tensor using
# the decode_img function

def process_path(file_path):
    label = get_label(file_path)
    img = tf.io.read_file(file_path)
    img = decode_img(img)
    return img, label

# Function 4 - build_dataset
# Creates a function to construct the dataset of images from a list of files.
# This function runs the process_path function in parallel

def build_dataset(file_list):
    # custom function to convert a list of 
    ds = tf.data.Dataset.from_tensor_slices(file_list)
    ds = ds.map(process_path, num_parallel_calls=AUTOTUNE) # parallel routine
    return ds

# Function 5 - ds_len
# Creates a straightforward function to pull the length of a tensorflow dataset

def ds_len(ds):
    ''' get length of a tensorflow dataset '''
    return tf.data.experimental.cardinality(ds).numpy()

Data Augmentation

Write some functions to create some data augmentations. These augmentations will randomly perturbate each of hte images in our data set, which, in a sense, artificially expands our training data set.

In [10]:
# functions modified and impired from:
#    https://www.wouterbulten.nl/blog/tech/data-augmentation-using-tensorflow-data-dataset/

def img_flip(img, label):
    return tf.image.random_flip_left_right(img), label


def img_color(img, label):
    img = tf.image.random_hue(img, 0.08)
    img = tf.image.random_saturation(img, 0.6, 1.6)
    img = tf.image.random_brightness(img, 0.05)
    img = tf.image.random_contrast(img, 0.7, 1.3)
    return img, label


# Chris Deotte - https://www.kaggle.com/cdeotte/rotation-augmentation-gpu-tpu-0-96
def get_mat(rotation, shear, height_zoom, width_zoom, height_shift, width_shift):
    # returns 3x3 transform matrix which transforms indicies

    # CONVERT DEGREES TO RADIANS
    rotation = math.pi * rotation / 180.
    shear = math.pi * shear / 180.

    # ROTATION MATRIX
    c1 = tf.math.cos(rotation)
    s1 = tf.math.sin(rotation)
    one = tf.constant([1],dtype='float32')
    zero = tf.constant([0],dtype='float32')
    rotation_matrix = tf.reshape( tf.concat([c1,s1,zero, -s1,c1,zero, zero,zero,one],axis=0),[3,3] )

    # SHEAR MATRIX
    c2 = tf.math.cos(shear)
    s2 = tf.math.sin(shear)
    shear_matrix = tf.reshape( tf.concat([one,s2,zero, zero,c2,zero, zero,zero,one],axis=0),[3,3] )

    # ZOOM MATRIX
    zoom_matrix = tf.reshape( tf.concat([one/height_zoom,zero,zero, zero,one/width_zoom,zero, zero,zero,one],axis=0),[3,3] )

    # SHIFT MATRIX
    shift_matrix = tf.reshape( tf.concat([one,zero,height_shift, zero,one,width_shift, zero,zero,one],axis=0),[3,3] )

    return K.dot(K.dot(rotation_matrix, shear_matrix), K.dot(zoom_matrix, shift_matrix))


# Chris Deotte - https://www.kaggle.com/cdeotte/rotation-augmentation-gpu-tpu-0-96
def transform(image,label):
    # input image - is one image of size [dim,dim,3] not a batch of [b,dim,dim,3]
    # output - image randomly rotated, sheared, zoomed, and shifted
    DIM = IMG_HEIGHT
    XDIM = DIM%2 #fix for size 331

    rot = 15. * tf.random.normal([1],dtype='float32')
    shr = 5. * tf.random.normal([1],dtype='float32')
    h_zoom = 1.0 + tf.random.normal([1],dtype='float32')/10.
    w_zoom = 1.0 + tf.random.normal([1],dtype='float32')/10.
    h_shift = 16. * tf.random.normal([1],dtype='float32')
    w_shift = 16. * tf.random.normal([1],dtype='float32')

    # GET TRANSFORMATION MATRIX
    m = get_mat(rot,shr,h_zoom,w_zoom,h_shift,w_shift)

    # LIST DESTINATION PIXEL INDICES
    x = tf.repeat( tf.range(DIM//2,-DIM//2,-1), DIM )
    y = tf.tile( tf.range(-DIM//2,DIM//2),[DIM] )
    z = tf.ones([DIM*DIM],dtype='int32')
    idx = tf.stack( [x,y,z] )

    # ROTATE DESTINATION PIXELS ONTO ORIGIN PIXELS
    idx2 = K.dot(m,tf.cast(idx,dtype='float32'))
    idx2 = K.cast(idx2,dtype='int32')
    idx2 = K.clip(idx2,-DIM//2+XDIM+1,DIM//2)

    # FIND ORIGIN PIXEL VALUES           
    idx3 = tf.stack( [DIM//2-idx2[0,], DIM//2-1+idx2[1,]] )
    d = tf.gather_nd(image,tf.transpose(idx3))

    return tf.reshape(d,[DIM,DIM,3]),label


def ds_augment(ds):
    # create a list of augmentation functions
    augmentations = [img_flip, img_color, transform]

    # map each augmentation function to the dataset in parallel
    for f in augmentations:
        ds = ds.map(f, num_parallel_calls=AUTOTUNE)

    # Make sure that the values are still in [0, 1]
    ds = ds.map(lambda x, label: (tf.clip_by_value(x, 0, 1), label), num_parallel_calls=AUTOTUNE)

    return ds

Build Datasets

The following functions uses the previously defined functions to convert the files into trainin / validation / test tensorflow datasets. For testing we create an augmented and a non-augmented version to compare the performance gains realized by augmenting the dataset.

In [11]:
# Function 1 - prepare_for_training
# Creates a function to feed data into the CNN in a random order
#   - cache the dataset in memory
#   - shuffle it fully
#   - repeat the shuffle as needed (buffer is empty)
#   - feed data out at a batch size
#   - prefetch the next batch and have it ready when requierd to speed processing

def prepare_for_training(ds,
                         cache = True,
                         shuffle = False,
                         augment = False,
                         repeat = True,
                         prefetch = True,
                         batch_size = BATCH_SIZE):

    # we always want to cache, but this is set up to be generic in case you wouldn't want to
    if cache:
        if isinstance(cache, str):
            ds = ds.cache(cache)
        else:
            ds = ds.cache()

    # we don't always want to shuffle (validation / test)
    if shuffle:
        ds = ds.shuffle(buffer_size=ds_len(ds), reshuffle_each_iteration = True)

    if repeat:
        ds = ds.repeat() # always repeat

    # we will only augment the training set
    if augment:
        ds = ds_augment(ds)

    ds = ds.batch(batch_size)

    if prefetch:
        ds = ds.prefetch(buffer_size=AUTOTUNE) # fetch a batch in the background

    return ds


# Function 2 - build_labelled_datasets
# Using the create_train_val_test function from the previous code chunk, build_laballed_datsets
# creates a function to convert the 25,000 .jpg images into a Keras readable format and split them
# into training, validation and testing data subsets

def build_labelled_datasets(path):
    train_files, validation_files, test_files = create_train_val_test(path)

    labeled_train_ds = build_dataset(train_files)
    labeled_val_ds = build_dataset(validation_files)
    labeled_test_ds = build_dataset(test_files)

    # display the lengths of the three sets
    print('Train size = {}, Validation Size = {}, Test Size = {}'
          .format(ds_len(labeled_train_ds), ds_len(labeled_val_ds), ds_len(labeled_test_ds)))

    return labeled_train_ds, labeled_val_ds, labeled_test_ds

# Function 3 - prepare_datasets
# Creates a function to (again, Mark, not sure what this does!!)

def prepare_datasets(train, val, test):

    train_ds = prepare_for_training(train, shuffle = True, augment = False)    # unaugmented training set
    train_aug_ds = prepare_for_training(train, shuffle = True, augment = True) # augmented training set
    val_ds = prepare_for_training(val)      # no need to shuffle
    test_ds = prepare_for_training(test,
                                   cache = True,
                                   shuffle = False,
                                   augment = False,
                                   repeat = False,
                                   prefetch = False,
                                   batch_size = ds_len(labeled_test_ds))

    return train_ds, train_aug_ds, val_ds, test_ds
In [12]:
# Create the training, validation and testing TensorFlow datasets for input into Keras
labeled_train_ds, labeled_val_ds, labeled_test_ds = build_labelled_datasets(TRAIN_PATH)
train_ds, train_aug_ds, val_ds, test_ds = prepare_datasets(labeled_train_ds, labeled_val_ds, labeled_test_ds)
Train size = 18750, Validation Size = 3750, Test Size = 2500

Visualizing the data

After constructing the three TensorFlow datasets, we can now visualize what these images looks like. Looking at 3 images from each of the subsets of image data, it can be concluded that the data has been re-sized appropriately and the data is ready for input into Keras.

In [13]:
# Function 1 - plot_n_imgs_tf
# Creates a function to view images in a dataset

def plot_n_imgs_tf(ds, n = 6, title = None):
    ncols = 3
    nrows = (n - 1) // ncols + 1
    figw = 20
    fig, axs = plt.subplots(nrows,
                            ncols,
                            figsize=(figw, figw / ncols * nrows), squeeze=False)
    imgs = [img for img, lab in ds.take(n)]
    for i, (ax, img) in enumerate(zip(axs.flatten(), imgs)):
        ax.grid(False)
        ax.set_xticks([])
        ax.set_yticks([])
        ax.imshow(img, interpolation='bilinear')
        if title is not None and i % 2 == 1: # assumes there is 3 columns
            ax.set_title(title, color = 'dimgrey', size = 18)
In [14]:
# View 3 images from the training dataset
plot_n_imgs_tf(labeled_train_ds, 3, "Training Set")

# View 3 images from the validation dataset
plot_n_imgs_tf(labeled_val_ds, 3, "Validation Set")

# View 3 images from the testing dataset
plot_n_imgs_tf(labeled_test_ds, 3, "Test Set")

Modeling

Setting Constants in the Convolutional Neural Network

For a binary classification problem, there are a few steps in the CNN that will stay consistent throughout the model testing phase of this project. This includes the model compile and fit steps. In order to keep these parameters consistent, a number of functions were created to limit the repetitive coding that would be required as models are tested.

The initial paraemeters of the compile method are:

  • loss = 'binary_crossentropy', a loss function typical for binary classification problems
  • optimizer = 'adam', a typical optimizer for CNNs
  • metrics = ['accuracy'], to log the accuracy of the CNN

The initial parameters of the fit method are:

  • epochs = 5, for training a neural network 5 times
  • steps_per_epoch = the number of training steps in each epoch and is equal to the ratio of the training data length and BATCH_SIZE (initialized with image length and width)
  • validation_steps = the ratio of the validation data length and BATCH_SIZE
  • keras.callbacks = TensorBoard, ModelCheckpoint, CSVlogger and EarlyStopping

As mentioned above, using keras.callbacks, a function has been created to log and view features of the model. These callbacks include:

  • TensorBoard: a visualization tool to graph training and testing metrics
  • ModelCheckpoint: saves a file of the model after each epoch but specified to save only the best model
  • CSVLogger: saves a csv of the epoch results
  • EarlyStopping: stops the training process after 10 unimproved epochs

Finally, a function was created to reset the weights of a model in order to re-train a model from scratch.

Each of these functions are created in the following code chunk.

In [15]:
# Function 1 - get_log_file
# This function will return the path of the file to be saved using the keras.callbacks
# used for ModelCheckPoint, TensorBoard and CSVLogger callbacks

def get_log_file(model_name, log_type):
    '''
        log_typ: one of "train_logs", "models", "history"
    '''
    root_log_dir = os.path.join(os.curdir, "logs", log_type)
    if log_type != 'train_logs':
        pathlib.Path(root_log_dir).mkdir(parents=True, exist_ok=True)
        if log_type == 'models':
            file_type = 'h5'
        elif log_type == 'history':
            file_type = 'csv'
        log_id = time.strftime(root_log_dir + '/' + model_name + "_%Y_%m_%d-%H_%M-%S." + file_type)
    else:
        run_id = time.strftime(model_name + "_run_%Y_%m_%d-%H_%M-%S")
        log_id = os.path.join(root_log_dir, run_id)
    return log_id

# Function 2 - compile_model
# Creates a function to set constant parameters of the Keras `compile` method

def compile_model(model, loss = "binary_crossentropy", optimizer = 'adam', metrics = ['accuracy'], learning_rate = None):
    if learning_rate is not None:
        optimizer = keras.optimizers.Adam(learning_rate = learning_rate)
    model.compile(loss = loss, optimizer = optimizer, metrics = metrics)


# Function 3 = fit_model
# Creates a function to set constant parameters of the Keras `fit` method

def fit_model(model, epochs = 5,
              pre_trained = False,
              display_summary = False,
              batch_size = BATCH_SIZE,
              early_stoppage = 10,
              learning_rate = None,
              augment = False,
              lr_schedule = False,
              lr_schedule_epochs = 20,
              lr_decay = 0.05,
              reset = False,
              verbose = 1):

    name = model._name

    # This function keeps the learning rate at 0.001 for the first N epochs
    # and decreases it exponentially after that.
    def scheduler(epoch):
        if learning_rate is not None:
            rate = learning_rate
        else:
            rate = 0.001

        if epoch < lr_schedule_epochs:
            return rate
        else:
            return rate * tf.math.exp(lr_decay * (lr_schedule_epochs - epoch))

    tensorboard_cb = keras.callbacks.TensorBoard(get_log_file(name, "train_logs"))
    checkpoint_cb = keras.callbacks.ModelCheckpoint(get_log_file(name, "models"), save_best_only = True, monitor = 'val_accuracy')
    csv_logger_cb = keras.callbacks.CSVLogger(get_log_file(name, "history"), append=True)
    early_stoppage_cb = keras.callbacks.EarlyStopping(patience = early_stoppage, restore_best_weights = True, monitor = 'val_accuracy')
    learning_rate_cb = keras.callbacks.LearningRateScheduler(scheduler)

    callbacks = [tensorboard_cb, checkpoint_cb, csv_logger_cb]

    # if we want to schedule the learning rate simply append it to the callbacks list
    if lr_schedule:
        callbacks.append(learning_rate_cb)

    if early_stoppage:
        early_stoppage_cb = keras.callbacks.EarlyStopping(patience = early_stoppage, restore_best_weights = True, monitor = 'val_accuracy')
        callbacks.append(early_stoppage_cb)


    STEPS_PER_EPOCH = np.ceil(ds_len(labeled_train_ds) / batch_size)
    validation_steps = np.ceil(ds_len(labeled_val_ds) / batch_size)

    if not pre_trained:
        compile_model(model, learning_rate = learning_rate)

    if display_summary:
        print(model.summary())

    if augment:
        training_ds = train_aug_ds
    else:
        training_ds = train_ds

    if reset:
        reset_weights(model)

    history = model.fit(training_ds,
                        steps_per_epoch = STEPS_PER_EPOCH,
                        validation_data = val_ds,
                        validation_steps = validation_steps,
                        epochs=epochs,
                        callbacks = callbacks,
                        verbose = verbose)

    return history


# Function 3 - reset_weights
# Creates a function to reset model weights in order to ensure training occurs on a model from scratch
def reset_weights(model):
    # https://github.com/keras-team/keras/issues/341
    # reset all layer weights to allow retraining from random
    for layer in model.layers:
        if isinstance(layer, tf.keras.Model): #if you're using a model as a layer
            reset_weights(layer) #apply function recursively
            continue

        #where are the initializers?
        if hasattr(layer, 'cell'):
            init_container = layer.cell
        else:
            init_container = layer

        for key, initializer in init_container.__dict__.items():
            if "initializer" not in key: #is this item an initializer?
                  continue #if no, skip it

            # find the corresponding variable, like the kernel or the bias
            if key == 'recurrent_initializer': #special case check
                var = getattr(init_container, 'recurrent_kernel')
            else:
                var = getattr(init_container, key.replace("_initializer", ""))

            var.assign(initializer(var.shape, var.dtype))
            #use the initializer

Model Definitions

The following section outlines the different models created to differentiate images of dogs and cats. Each model was designed to test a parameter or feature of the layers typically used in CNNs. Prior to testing these parameters / features, a default convolutional 2D layer is initialized to avoid repetitive code in the model testing phase. This default layer uses:

  • kernel_size = 3
  • activation = 'relu'
  • padding = 'SAME'
  • kernel_initializer = 'he_normal'

These parameters are adjusted in some of the convolutional 2D layers but are for the most part consistent amongst the different model tests. Additionally, the following layers remain consistent throughout each of the tested models:

  • Conv2D
  • MaxPooling2D
  • Flatten
  • Dense
In [16]:
from functools import partial

# Create a default convolutional 2D layer for repeat use in model testing

DefaultConv2D = partial(keras.layers.Conv2D,
                        kernel_size=3,
                        activation='relu',
                        padding="SAME",
                        kernel_initializer='he_normal')

Model 1 - The "Simple" CNN

The purpose of Model 1 was to test the basic CNN related Keras layers and start building a relatively simple model. As the project evolved, the model's intention was to determine the highest achievable validation accuracy using the fewest possible parameters in a neural network. This model would also become a useful baseline model that could be used to evaluate the relative improvements of adding additional layers / features in future models.

The model consists of:

  • 5 x Conv2D layers
  • 3 x MaxPooling2D layers
  • 1 x Flatten layer
  • 1 x Dense layer

The model is built in the following code cell.

In [17]:
model_1 = Sequential([Conv2D(8, kernel_size = 2, padding='same', activation='relu', input_shape=(IMG_HEIGHT, IMG_WIDTH ,3)),
                      MaxPooling2D(2),
                      Conv2D(16, kernel_size = 2, activation='relu'),
                      Conv2D(16, kernel_size = 2, activation='relu'),
                      MaxPooling2D(2),
                      Conv2D(32, kernel_size = 2, activation='relu'),
                      Conv2D(32, kernel_size = 2, activation='relu'),
                      MaxPooling2D(2),
                      Flatten(),
                      Dense(1, activation = 'sigmoid')
])
model_1._name = 'model_1'

Model 1 is trained in the next code cell. Here we'll use 50 epochs to train the model and stop after 5 consecutive epochs where there is no increase in validation accuracy.

In [18]:
model_1_history = fit_model(model_1, epochs = 50, display_summary=False, early_stoppage = 5, verbose = 0)
print(r"Maximum validation accuracy of Model 1 is: " + str(max(model_1_history.history['val_accuracy'])))
Maximum validation accuracy of Model 1 is: 0.7971398

Based on the results of Model 1's training, it appears as though this model can achieve a validation accuracy of 81.33% where each epoch trains in approximately 7 seconds. This is a relatively good validation accuracy and computation time for our first attempt at a Convolutional Neural Network but there are plenty of other parameters to be tested. The next model will attempt to add some additional complexity.

Model 2 - The "Complex" CNN

Model 2 builds on Model 1 by incorporating additional layers and filters. The intention of this model was to test how some added complexity could improve the model accuracy. Additionally, a Dropout layer was added following and additional 2 Dense layers. The purpose of the Dropout layer was to prevent overfitting. With the additional Dense layers, a CNN tends to "memorize" the training data. With a Dropout layer, a random subset of "neurons" in the neural network are removed therefore preventing this "memorization". In this model, 40% of the neurons are removed following the first two Dense layers.

The model consists of:

  • 5 x Conv2D layers
  • 3 x MaxPooling2D layers
  • 1 x Flatten layer
  • 3 x Dense layers
  • 2 x Dropout layers

The model is built in the following code cell.

In [19]:
model_2 = Sequential([
    DefaultConv2D(16, kernel_size = 3, strides = 3, padding='same', activation='relu', input_shape=(IMG_HEIGHT, IMG_WIDTH ,3)),
    MaxPooling2D(pool_size = 3, strides =2),
    DefaultConv2D(32),
    DefaultConv2D(32),
    MaxPooling2D(pool_size = 3, strides = 2),
    DefaultConv2D(64),
    DefaultConv2D(64),
    MaxPooling2D(pool_size = 3, strides = 2),
    Flatten(),
    Dense(512, activation='relu', kernel_initializer='he_normal'),
    keras.layers.Dropout(0.4),
    Dense(256, activation='relu', kernel_initializer='he_normal'),
    keras.layers.Dropout(0.4),
    Dense(1, activation = 'sigmoid')
])
model_2._name = "model_2"

Model 2 is trained in the next code cell. Again, we'll use 50 epochs to train the model and stop after 5 consecutive epochs where there is no increase in validation accuracy. We expect to see some marginal improvement in validation accuracy.

In [20]:
model_2_history = fit_model(model_2, epochs = 50, display_summary=False, early_stoppage = 5, verbose = 0)
print(r"Maximum validation accuracy of Model 2 is: " + str(max(model_2_history.history['val_accuracy'])))
Maximum validation accuracy of Model 2 is: 0.8448093

Based on the results of Model 2's training, it appears as though this model can achieve a slightly higher validation accuracy than Model 1, achieving 85.23% on the 11th epoch. Even with almost 60 times the number of parameters as Model 1, each epoch trains in relatively the same amount of time coming in at approximately 8 seconds. Given the relatively quick computation time of the 1,000,000 parameters in Model 2, the next model will further increase in complexity to try and further improve validation accuracy.

Model 3 - Testing BatchNormalization

Similarly to Model 2, Model 3 continues to build off of the previous models by incorporating additional layers and filters. In this model, a regularization layer known as BatchNormalization is included. This layer normalizes the output of a layer so that the mean is 0 and the standard deviation is 1. This will help speed up the training of this model.

Although there are potential drawbacks of including Dropout and BatchNormalization in the same model, Model 3's results don't appear to be negatively affected.

In [21]:
model_3 = Sequential([
    DefaultConv2D(64,
                  kernel_size = 7,
                  strides = 3,
                  activation=None,
                  input_shape=(IMG_HEIGHT, IMG_WIDTH ,3)),
    BatchNormalization(),
    Activation('relu'),
    MaxPooling2D(pool_size = 2),

    DefaultConv2D(96, activation = None),
    BatchNormalization(),
    Activation('relu'),
    DefaultConv2D(96, activation = None),
    BatchNormalization(),
    Activation('relu'),
    MaxPooling2D(pool_size = 2),

    DefaultConv2D(128, activation = None),
    BatchNormalization(),
    Activation('relu'),
    DefaultConv2D(128, activation = None),
    BatchNormalization(),
    Activation('relu'),
    MaxPooling2D(pool_size = 2),

    Flatten(),
    Dense(1024, activation='relu', kernel_initializer='he_normal'),
    keras.layers.Dropout(0.4),
    Dense(256, activation='relu', kernel_initializer='he_normal'),
    keras.layers.Dropout(0.4),
    Dense(1, activation = 'sigmoid')
])
model_3._name = "model_3"

Model 3 is trained in the next code cell. Here, we'll double the number of epochs to 100 and stop after 10 consecutive epochs where there is no increase in validation accuracy. We continue to expect to see some marginal improvement in validation accuracy.

In [22]:
model_3_history = fit_model(model_3, epochs = 100, display_summary=False, early_stoppage = 10, reset = True, verbose = 0)
print(r"Maximum validation accuracy of Model 3 is: " + str(max(model_3_history.history['val_accuracy'])))
Maximum validation accuracy of Model 3 is: 0.91525424

Based on the results of Model 3's training, increased model complexity continues to improve the validation accuracy, with Model 3 achieving 91.03% on the 19th epoch. With almost 5 times the number of parameters as Model 2, Model 3's training starts to slow down taking approximately 3 times longer for each epoch at 21 seconds. This computation time is still not obstructive and the model can continue to be refined.

Although the validation accuracy of Model 3 has improved, there are signs that this model is overfitting, as the training accuracy approaches 100% on the 29th epoch. This, along with the relatively unstable validation accuracy at the later epochs, suggests that the network is memorizing the training images and some different parameters may be required to prevent this behaviour. We will attempt to address this later with image augmentation and a learning rate scheduler.

Model 4 - Global Average Pooling

A late addition to our model architecture, is a recent trend in the image classification networks. Rather than use a Flatten layer to transition from the convolutional layers to the dense layers we use a GlobalAveragePooling2D layer, which will average the values in each of the preceding convolutional layers. This is a very destructive action, but it is used througout all of the recent high end classification networks and we wanted to understand the impact it would have on our model.

Additionally, we further added complexity to the model by adjusting the number of convolution filters in each of the preceding layers. As GlobalAveragePooling2D ultimately reduces the number of model parameters, we are not concerned about an increase in computation time due to the additional filters.

In [23]:
model_4 = Sequential([
    DefaultConv2D(64,
                  kernel_size = 7,
                  strides = 3,
                  activation=None,
                  input_shape=(IMG_HEIGHT, IMG_WIDTH ,3)),
    BatchNormalization(),
    Activation('relu'),
    MaxPooling2D(pool_size = 2),

    DefaultConv2D(128, activation = None),
    BatchNormalization(),
    Activation('relu'),
    DefaultConv2D(128, activation = None),
    BatchNormalization(),
    Activation('relu'),
    MaxPooling2D(pool_size = 2),

    DefaultConv2D(256, activation = None),
    BatchNormalization(),
    Activation('relu'),
    DefaultConv2D(256, activation = None),
    BatchNormalization(),
    Activation('relu'),

    GlobalAveragePooling2D(),

    keras.layers.Dropout(0.4),
    Dense(1024, activation='relu', kernel_initializer='he_normal', kernel_regularizer = keras.regularizers.l2(0.01)),
    keras.layers.Dropout(0.4),
    Dense(256, activation='relu', kernel_initializer='he_normal', kernel_regularizer = keras.regularizers.l2(0.01)),
    keras.layers.Dropout(0.4),
    Dense(1, activation = 'sigmoid', kernel_regularizer = keras.regularizers.l2(0.01))
])
model_4._name = "model_4"

Model 4 is trained in the next code cell. Like Model 3, we'll use 100 epochs and stop training after 10 consecutive epochs where there is no increase in validation accuracy. Although we don't expect validation accuracy to improve much, the computation time for Model 4 should decrease compared with Model 3.

In [24]:
model_4_history = fit_model(model_4, epochs = 100, display_summary=False, early_stoppage = 10, verbose = 0)
print(r"Maximum validation accuracy of Model 4 is: " + str(max(model_4_history.history['val_accuracy'])))
Maximum validation accuracy of Model 4 is: 0.9065148

Based on the results of Model 4's training, increased model complexity continues to improve the validation accuracy, with Model 4 achieving 92.11% on the 31st epoch. With almost 5 times fewer parameters as Model 3, Model 4's training has decreased to 12 seconds per epoch.

Although the validation accuracy of Model 4 has slightly improved, there are still signs that this model is overfitting. Not only is the training accuracy approaching 100%, but the validation accuracy is increasingly unstable and changes by as much as 10 percentage points between epochs.

Since this model achieved the highest validation accuracy combined with a relatively quick computation time, the previously mentioned image augmentation and learning rate scheduler will be applied to this model in an attempt to prevent overfitting.

Model Evaluation

All Models

This section will attempt to evaluate and visualize the results of each of the four models highlighted in the previous section. First, a rerun of each model with 50 epochs will be executed in the next code cell in order to compare each model from the same baseline. In addition to training the four models, Model 4 is also trained using the previously mentioned image classification techniques; image augmentation and learning rate scheduler. In total, 7 models are trained; Model 1, 2, 3 and 4 without image augmentation or a learnging rate scheduler and Model 4 with image augmentation, with a learning rate scheduler and with both. The training is executed in the following code cell.

In [25]:
# Create model evaluations for each of the four models and include a LR schedule and augmentation for Model 4
evaluations = []

for index, model in enumerate([model_1, model_2, model_3, model_4]):
    history = fit_model(model, epochs = 50, display_summary=False, lr_schedule = False, augment = False, reset = True, early_stoppage = False, verbose = 0)
    evaluations.append(history)
    if index == 3:
        # add learning rate
        history = fit_model(model, epochs = 50, display_summary=False, lr_schedule = True, augment = False, reset = True, early_stoppage = False, verbose = 0)
        evaluations.append(history)

        # remove learning rate add augment
        history = fit_model(model, epochs = 50, display_summary=False, lr_schedule = False, augment = True, reset = True, early_stoppage = False, verbose = 0)
        evaluations.append(history)

        # add both learning rate and augmentation
        history = fit_model(model, epochs = 50, display_summary=False, lr_schedule = True, augment = True, reset = True, early_stoppage = False, verbose = 0)
        evaluations.append(history)

Now, that we have history for each of the 7 models, a DataFrame is built to compare the models. The model history is also saved to a csv file to recover later if required.

In [26]:
# Create a DataFrame for the model histories
models = ['model_1', 'model_2', 'model_3', 'model_4', 'model_4_lr', 'model_4_aug', 'model_4_lr_aug']
dfs = [pd.DataFrame(h.history).assign(model = m) for m, h in zip(models, evaluations)]
history_df = pd.concat(dfs, sort=False).reset_index(drop=False).rename(columns={'index':'epoch'})
history_melted = history_df.melt(id_vars=['model', 'epoch'])

#history_df.to_csv('model_evalation.csv')
#history_melted.to_csv('model_evaluation_melted.csv')

With the newly constructed DataFrame, a plot is created showing the change in validation accuracy of each of the seven models over each training epoch. As displayed on the below plot, Model 4 with image augmentation and a learning rate scheduler has the best and most stable validation accuracy over each training epoch. This suggests the model is both highly accurate but not overfit as some of the models might be.

In [27]:
# Create a plot to visualize the validation frequency change over each epoch
fig, ax = plt.subplots(figsize = (20,10))
colors = {'model_1': sns.color_palette()[0],
          'model_2': sns.color_palette()[1],
          'model_3': sns.color_palette()[2],
          'model_4': sns.color_palette()[4],
         }
for model in history_melted.model.sort_values().unique():
    f = history_melted.variable.isin(['val_accuracy']) & (history_melted.model == model)
    df = history_melted[f]

    if 'model_4' in model:
        color = colors['model_4']
    else:
        color = colors[model]

    if model == 'model_4_lr_aug':
        lw = 5
        ls = '-'
    elif model == 'model_4_lr':
        lw = 3
        ls = '--'
    elif model == 'model_4_aug':
        lw = 3
        ls = 'dotted'
    elif model == 'model_4':
        lw = 3
        ls = '-'
    else:
        lw = 1
        ls = '-'
    ax.plot(df.epoch, df.value, lw = lw, ls = ls, color=color, label = model)
ax.legend(loc='lower right')
plt.xlabel('Epoch')
plt.ylabel('Accuracy')
plt.title('Model Accuracy Comparison', size = 22)
plt.show()

Focusing on Model 4 in a bit more detail, we ran an extensive training where we let it train for a maximum of 500 epochs with early stoppage set at 40 epochs based on validation accuracy. It completed after 167 epochs an d the chart below outlines the training run that included both a learning rate scheduler and image augmentation. This plot displays clearly both a high and stable validation accuracy of Model 4.

In [30]:
# Create a plot that displays the validation accuracy and loss for the best model
best_history = pd.read_csv('logs/history/model_41_2020_04_04-17_26-43.csv')

fig, ax = plt.subplots(figsize = (12, 8))
best_history.drop(columns='epoch').plot(lw = 3, ax=ax)
ax.plot(best_history.epoch, best_history.val_accuracy, ms = 10, c = sns.color_palette()[2])
plt.annotate(xy = (117, 0.92), color = sns.color_palette()[2], s = 'val_accuracy = {:.4f}'.format(0.96822), fontsize = 14, fontweight = 'demi')
plt.title("Model 4 - Best Run")
plt.ylim((0,1))
plt.xlabel('Epoch')
plt.ylabel('')
plt.show()

Comparing each of the models validation accuracy side by side, the following plot outlines the benefits of the complexity of Model 4 with a learning rate scheduler and image augmentation. From the improvements made from Model 1 to the best run of Model 4, there is a 20% improvement in validation accuracy.

In [31]:
# Create a plot to compare the validation accuracy of all of the models
with sns.axes_style('whitegrid'):
    fig, ax = plt.subplots(figsize = (14, 8))
    best_results = history_df.groupby('model', as_index = False)[['val_accuracy']].max()
    best = {'model': 'Model 4 - Best', 'val_accuracy': best_history[best_history.val_accuracy == best_history.val_accuracy.max()].val_accuracy.values[0]}
    best_results = best_results.append(best, ignore_index = True)
    #sns.barplot(data = best_results, x = 'model', y = 'val_accuracy', color = ['grey']*7+['blue'], ax = ax)
    rects = ax.bar(x=best_results.model, height = best_results.val_accuracy, color = ['grey']*7+[sns.color_palette('Blues')[4]])

    for rect in rects:
        height = rect.get_height()
        ax.text(rect.get_x() + rect.get_width()/2., height - 0.05,
                '{:.4f}'.format(height), color = 'white', fontweight = 'medium', fontsize = 12,
                ha='center', va='bottom')

    ax.set_xticklabels(labels = ["1", "2", "3", "4", "4-Aug", "4-LR", "4-Aug + LR", "4-Best"])
    sns.despine(left=True)
    ax.xaxis.grid(False)
    plt.xlabel("Model")
    plt.title('Highest Accuracy from Each Model')
    plt.show()

Model 4

In order to look into the results of the best training run of Model 4 in a bit more detail, the model is evaluated against the test dataset. As shown below, 3.4% of the test dataset was misclassified which is very similar to the validation accuracy of the model. Additionally, a confusion matrix is constructed to outline where the model incorrectly predicted images most often.

In [32]:
# Load the best training run of Model 4
best_model = keras.models.load_model("logs/models/model_41_2020_04_04-17_26-43.h5")

# Evaluate the best training run of Model 4 on the test dataset
best_model.evaluate(test_ds, verbose = 0)

# Extract the images and labels from the test dataset for analyzing the misclassified images
for imgs, labels in test_ds:
    y_test = labels.numpy()
    X_test = imgs.numpy()
y_probs = best_model.predict(test_ds).reshape(-1, )
y_predict = (y_probs > 0.5)*1

# Create an object of misclassified images and labels
misclassified_imgs = X_test[y_predict != y_test]
misclassified_labels = y_predict[y_predict != y_test]

print(r"The number of images that were misclassified in the test dataset: " + str(len(misclassified_labels)) + r" or " + str(round(len(misclassified_labels) / len(y_test) * 100, 2)) + r"% of the test datset.")

# Construct the confusion matrix using the predicted and actual labels
cm = confusion_matrix(y_test, y_predict)

C_df = pd.DataFrame(cm,
                    index= ['Cat', 'Dog'],
                    columns= ['Cat', 'Dog'])

fig, ax = plt.subplots(figsize = (5, 5))
sns.heatmap(C_df,
            cmap = 'Blues', cbar = False,
            annot = True, fmt = 'd', annot_kws={"fontsize":12},
            ax = ax)
ax.tick_params(labelsize = 14, which = 'both')
ax.set_yticklabels(ax.get_yticklabels(), rotation = 0)
plt.xlabel('Predicted Class')
plt.ylabel('Actual Class')
plt.title('Cats and Dogs')
plt.show()
The number of images that were misclassified in the test dataset: 85 or 3.4% of the test datset.

To get a sense of what types of images Model 4 incorrectly classified, a plot was generated to show which images were incorrectly classified and what classification the image was given.

In [33]:
# Plot a few of the misclassified images
CLASSES = ['Cat', 'Dog']
f, (ax1, ax2, ax3, ax4) = plt.subplots(1, 4, figsize=(12, 3))
ax1.imshow(misclassified_imgs[2])
ax1.set_title('Predicted "{}"'.format(CLASSES[misclassified_labels[2]]))
ax1.axis('off')
ax2.imshow(misclassified_imgs[1])
ax2.set_title('Predicted "{}"'.format(CLASSES[misclassified_labels[1]]))
ax2.axis('off')
ax3.imshow(misclassified_imgs[16])
ax3.set_title('Predicted "{}"'.format(CLASSES[misclassified_labels[16]]))
ax3.axis('off')
ax4.imshow(misclassified_imgs[3])
ax4.set_title('Predicted "{}"'.format(CLASSES[misclassified_labels[3]]))
ax4.axis('off')
plt.show()

Based on a sample of the misclassified images, there are a few consistent aspects of the images that are worth mentioning:

  • Images of cats with their bellies exposed are incorrectly classified as dogs
  • Images of small or young looking dogs are incorrectly classified as cats
  • Images with both dogs and cats are incorrectly classified
  • Images of a cat or a dog with humans in the frame are incorrectly classified

It's clear from looking at these misclassified images that additional image augmentation could be added to achieve better invariance. Specifically, we chose not to include a flip vertical augmentation. Additionally, we only rotate through a small angle, wheras we could have included a much larger range of angles which could have improved our finaly accuracy. These augmentation changes would be worth considering further in future investigations.

Conclusion and Summary

This project outlines the process for constructing a convolutional neural network to classify images of cats and dogs. The project outlines four different models and how validation accuracy improves with increased complexity, image augmentation and a learning rate scheduler. The best model outlined in this project was able to achieve approximately 97% accuracy on the validation dataset with a similar accuracy achieved on the testing dataset.

References

[1] Martín Abadi, Ashish Agarwal, Paul Barham, Eugene Brevdo, Zhifeng Chen, Craig Citro, Greg S. Corrado, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Ian Goodfellow, Andrew Harp, Geoffrey Irving, Michael Isard, Rafal Jozefowicz, Yangqing Jia, Lukasz Kaiser, Manjunath Kudlur, Josh Levenberg, Dan Mané, Mike Schuster,Rajat Monga, Sherry Moore, Derek Murray, Chris Olah, Jonathon Shlens,Benoit Steiner, Ilya Sutskever, Kunal Talwar, Paul Tucker, Vincent Vanhoucke, Vijay Vasudevan, Fernanda Viégas, Oriol Vinyals, Pete Warden, Martin Wattenberg, Martin Wicke, Yuan Yu, and Xiaoqiang Zheng. TensorFlow: Large-scale machine learning on heterogeneous systems, 2015. Software available from tensorflow.org.

[2] Geron, A. Hands-On Machine Learning with Scikit-Learn, Keras & Tensorflow, 2nd ed. Sebastopol, CA, USA: O'Reilly Media, 2019.

[3] Deotte, C. "Data Augmentation using GPU/TPU for Maximum Speed!" [Online]. Available: https://www.kaggle.com/cdeotte/rotation-augmentation-gpu-tpu-0-96

[4] Bulten, W. "Simple and efficient data augmentations using the Tensorfow tf.Data and Dataset API" [Online]. Available: https://www.wouterbulten.nl/blog/tech/data-augmentation-using-tensorflow-data-dataset/

[5] Ioffe, Sergey, and Christian Szegedy. "Batch normalization: Accelerating deep network training by reducing internal covariate shift." arXiv preprint arXiv:1502.03167 (2015).

[6] He, Kaiming, et al. "Deep residual learning for image recognition." Proceedings of the IEEE conference on computer vision and pattern recognition. 2016.

[7] Srivastava, Nitish, et al. "Dropout: a simple way to prevent neural networks from overfitting." The journal of machine learning research 15.1 (2014): 1929-1958.