In this tutorial, we are going to create an image classifier to classify cats and dogs with more than 80% accuracy. I did this project when I was in the final year of my computer science degree. Today, I’ll walk you through the entire project so that you can also do it.
You don’t need to be an expert coder to do this project. If you know Python fundamentals, you can create your first image classifier pretty easily.
Before we jump right in, we will look at some basic concepts of image classification. Then, we will move to the coding part.
What is Image Classification?
Consider an an example:
A 3-year-old baby is an expert in classifying things. The baby can identify it’s mom, dad, relatives, toys, food, and many more. How did the baby get all the knowledge?
Actually, this is by training. The baby saw various things for the first time and could not understand what they are. But after seeing them several times and getting inputs from people around, the baby has become a pro in classifying everything.
The computer is like a newborn baby. It does not know the difference between a cat and a dog. But we know it. So, let’s help the computer to identify cats and dogs correctly.
We will be using the Python programming language to give instructions to the computer.
Hopefully, the computer will be able to distinguish between cats and dogs by the end of this tutorial.
Our Action Plan
Let’s do a quick overview of what we are going to do. We need to collect lots of images of cats and dogs.
Then, we need to put all the cat images in a folder and tell the computer: ”These are all cats, go through all of these images and learn as much as you can”.
We will do the same with the dogs. Finally, we will test whether the machine obeyed us or not.
Before you start creating the image classification model, make sure you have all the libraries and tools installed in your system. You can refer to this article for setting up your environment for doing this image classification project.
Cats and Dogs Data Set
Our first task is to find a lot of images of cats and dogs. Good news!. Someone has already done that for you. Go to this link, and you can download the data set.
You will see a page like this. Click on the Data tab. Scroll down below, and you will see a link that says Download all.
You need to have a Kaggle account to download these files. So sign up there and reload this page again to download the data set.
The file is of size more than 800 MB. Download the zip file and extract it. You will see two folders inside the main folder, that are test and train.
Preparing the Data Set
There are some issues with the data set. The folder test is not labeled, and we don’t want that. So, go ahead and delete the folder test. Don’t worry. We will create a new test folder.
Now we have only the train folder. Inside that, we have two child folders called cats and dogs. Each folder contains 12500 images.
Now, come back to the parent directory if you are completed watching some cute images. Let’s create a new folder and name it as test.
Inside test, we need two folders, which are cats and dogs. As of now, these folders are empty. So we need to add some images to these folders.
We will go to train -> cats, and cut 2500 images of cats, and paste it inside test -> cats. Do the same with dogs as well.
Our test and train folders are ready. This is how our data set looks like:
Our data set is ready. Now, we can start coding our image classification model.
We are going to use the Keras library for creating our image classification model. Keras is a Python library for machine learning that is created on top of tensorflow.
Tensorflow is a powerful deep learning library, but it is a little bit difficult to use, especially for beginners. Keras makes it very simple. So, we will be using keras today.
Creating the Image Classification Model
Let’s start the coding part. We will learn each line of code on the go. First things first, we will import the required libraries and methods into the code.
from keras.models import Sequential from keras.layers import Conv2D,Activation,MaxPooling2D,Dense,Flatten,Dropout import numpy as np
These are the things that we need. Conv2D, Activation, MaxPooling2D, Dense, Flatten, and Dropout are different types of layers that are available in keras to build our model. We need several layers for our model since we are using deep learning.
Let’s initialize a convolutional neural network using the sequential model of keras.
classifier = Sequential()
There are two ways to build keras models, which are Sequential and Functional. We are using sequential here to build our model. The sequential API helps us to create models in a layer-by-layer format.
Now we have a convolutional neural network (CNN). CNN is a class of deep learning networks, which is most commonly used for image processing and image classification purposes.
CNN has several layers. So, let’s add some layers to our classifier.
So, we added a convolutional layer as the first layer. Conv2D stands for a 2-dimensional convolutional layer.
Here, 32 is the number of filters needed. A filter is an array of numeric values. (3,3) is the size of the filter, which means 3 rows and 3 columns.
The input image is 64643 in dimensions, that is, 64 height, 64 widths, and 3 refer to RGB values. Each of the numbers in this array (64,64,3) is given values from 0 to 255, which describes the pixel intensity at that point.
The output of this layer will be some feature maps. The training images will go through this layer, and we will obtain some feature maps at the end of this layer. A feature map is a map that shows some features of the image.
Now, let’s create the next layer.
Let’s pass the feature maps through an activation layer called ReLU. ReLU stands for the rectified linear unit. ReLU is an activation function.
An activation function of a neuron defines the output of that neuron, given some input. This output is then used as input for the next neuron, and so on until the desired solution is obtained.
ReLU replaces all the negative pixel values in the feature map with 0.
Now, let’s add the next layer.
Pooling helps to reduce the dimensionality of each feature map and retains the essential information.
This helps to decrease the computational complexity of our network.
Here, we used max-pooling with a 2*2 filter. The filter will take the max values from each pool.
A classic convolutional neural network has 3 convolutional blocks followed by a fully connected layer. We created the first set of 3 layers. We can repeat this twice more.
classifier.add(Conv2D(32,(3,3))) classifier.add(Activation('relu')) classifier.add(MaxPooling2D(pool_size =(2,2))) classifier.add(Conv2D(32,(3,3))) classifier.add(Activation('relu')) classifier.add(MaxPooling2D(pool_size =(2,2)))
To prevent overfitting, we use the dropout layer in our model. Overfitting is a modeling error that occurs to make an overly complex model. This layer drops out a random set of activations in that layer by setting them to zero as data flows through it.
To prepare our model for dropout, we first flatten the feature map to 1-dimension.
Then we want to initialize a fully connected network by using the Dense function and apply the ReLu activation function to it.
Finally, let’s add the dropout layer.
After dropout, we’ll initialize 1 more fully connected layer. This will output an n-dimensional vector, where n is the number of classes we have (2 in this case).
We need to apply a sigmoid activation function to our model so that it will convert the data into probabilities for each class. A sigmoid function is a mathematical function having an S-shaped curve or sigmoid curve.
That’s it with building our classifier model. Now, we have a CNN model with several layers in it.
Let’s do a summary of our classifier model. This is how the model looks like.
Compiling the Model
We have a pretty good model now. Before we train our model with all the images, we have to compile the model. There is a specific method for that in keras, which is compile().
classifier.compile(optimizer ='rmsprop', loss ='binary_crossentropy', metrics =['accuracy'])
Optimizer rmsprop will perform gradient descent for this model. Gradient descent is actually an optimization algorithm which helps to find the minimum value of a function.
The binary_crossentropy is the best loss function for binary classification problems. A loss function, also known as cost function, is a measure of how good a prediction model can predict the expected outcome.
We also set the metrics to accuracy so that we will get the details of the accuracy after training.
Data augmentation is required before we train the model to reduce overfitting. Data augmentation means increasing the number of images in the data set.
So, we will flip, zoom, and do a lot of things with all the existing data set images, so that the machine will get a variety of types of images to study.
For that, we will import the ImageDataGenerator method from the keras library.
from keras.preprocessing.image import ImageDataGenerator train_datagen = ImageDataGenerator(rescale =1./255, shear_range =0.2, zoom_range = 0.2, horizontal_flip =True) test_datagen = ImageDataGenerator(rescale = 1./255)
We set these parameters so that the machine will get trained with the images at different positions and details to improve the accuracy.
Setting Train and Test directories
Before we train the model, we need to set the train and test directories. Keras has a method called flow_from_directory() method.
training_set = train_datagen.flow_from_directory('C:/Users/Lab/Project/train', target_size=(64,64), batch_size= 32, class_mode='binary') test_set = test_datagen.flow_from_directory('C:/Users/Lab/Project/test', target_size = (64,64), batch_size = 32, class_mode ='binary')
After we run this cell in our notebook, the machine will say that it has found the images in our data set.
Training the classifier
Finally, it’s time to train the model. We have done a lot of things to make our model and data as perfect as possible. Now, let’s see how we can train this image classification model.
For this purpose, we need two more methods. One is the display method from the IPython.display package and the other one is Image from PIL library (Pillow library).
We will train the model using the fit_generator() method of keras library.
from IPython.display import display from PIL import Image classifier.fit_generator(training_set, steps_per_epoch =625, epochs = 30, validation_data =test_set, validation_steps = 5000)
Here, we passed the training_set as the first parameter of the fit_generator() method. I put the steps_per_epoch as 625 and epochs as 30.
There are no perfect epochs and steps_per_epochs so that we can use the values in every model. These two values change for different models and different data sets.
Generally, we find the best values for our model by trying many values many times and taking the best values from them.
Then, we passed the test_set as the validation_data and set the validation_steps to 5000.
Let’s run this cell and as you can see, this will take a lot of time to train the classifier. So, if you want fast results, reduce the epochs and steps_per_epochs to some lower values.
As these values become larger, the longer it will take the machine to learn everything. But the longer it takes, the better will be the accuracy.
As you can see, I got a validation accuracy of 83.7%.
If you took a lot of time and finally completed training your model successfully, well done. Sometimes, it takes a lot of patience to do this.
You have patiently trained your model, and now, you must save this model if you want to use this model in the future.
Otherwise, for testing the model a few days later, you will need to train the whole model again from start. That is ridiculous.
So, as soon as you finish training the model, save it. Let’s see how we can do this. It is pretty simple.
Saving the trained model
To save the trained model, we need to use a keras method called save(). Simple as that.
This line of code will create an HDF5 file with the name catdog_cnn_model. When we want to use this model later, we just need to load this saved model into our code.
There is a method called load_data() for doing this purpose. We first need to import this method from keras.models package.
Then we can load the model, and use it for testing without wasting any time for training the model.
from keras.models import load_model classifier = load_model('catdog_cnn_model.h5')
Testing the classifier
Finally, let’s see how much did the machine learn. We will test the model by giving some random images of cats and dogs to the model. Let’s see whether the machine will be successfully able to identify which one is a cat and which one is a dog.
I just gave a random image to the model by using the load_img() method. This image is then converted to an array of numbers using the to_array() method.
Then it expanded its dimensions using the expand_dims() method, to help the machine predict pretty well.
Finally, we passed this image to the predict() method of keras library. The machine will quickly process the image and identify whether it is a cat (value 1) or a dog (value 0).
import numpy as np from keras.preprocessing import image test_image =image.load_img('C:/Users/Lab/image.jpeg',target_size =(64,64)) test_image =image.img_to_array(test_image) test_image =np.expand_dims(test_image, axis =0) result = classifier.predict(test_image) if result >= 0.5: prediction = 'dog' else: prediction = 'cat' print(prediction)
The computer is now an expert in classification. It just predicted the result correctly.
I downloaded 5 each high-quality images of cats and dogs from pexels.com and tested the model using all these images. My classifier did a pretty good job actually.
It classified 8 images correctly, and 2 of its predictions were wrong. Still, that is a pretty good accuracy.
We have successfully created an image classifier using deep learning with the keras library of Python.
Our image classifier predicted the results with an accuracy of 83.7 percentage.
How did your image classifier perform? I’m curious to know about how perfectly your classifier predicted cats and dogs. Let me know the details in the comments section.
I spent a long time making this article. I would appreciate it if you would be willing to share it. It will encourage me to create more useful tutorials like this.
It’s the age of the internet and it’s not going to go anytime soon! That’s why web development is such a booming industry right now, with new job opportunities and prospects cropping up every...
You may love it or hate it, but Java is the most commonly used programming language all around the world, by both service-based companies and product-based companies. The best part of Java is...