Preprocessing the images
The images for the different classes will be stored in different folders, so it will be easy to label their classes. We will read the images using Opencv functions, and will resize them to different dimensions, such as 224 x 224 x 3. We'll subtract the mean pixel intensity channel-wise from each of the images, based on the ImageNet dataset. This means subtraction will bring the diabetic retinopathy images to the same intensity range as that of the processed ImageNet images, on which the pre-trained models are trained. Once each image has been prepossessed, they will be stored in a numpy array. The image preprocessing functions can be defined as follows:
def get_im_cv2(path,dim=224):
img = cv2.imread(path)
resized = cv2.resize(img, (dim,dim), cv2.INTER_LINEAR)
return resized
def pre_process(img):
img[:,:,0] = img[:,:,0] - 103.939
img[:,:,1] = img[:,:,0] - 116.779
img[:,:,2] = img[:,:,0] - 123.68
return img
The images are read through the opencv function imread, and then resized to (224,224,3) or any given dimension using an interlinear interpolation method. The mean pixel intensity in the red, green, and blue channels for the ImageNet images are 103.939, 116.779, and 123.68, respectively; the pre-trained models are trained after subtracting these mean values from the images. This activity of mean subtraction is used to center the data. Centering the data around zero helps fight vanishing and exploding gradient problem, which in turn helps the models converge faster. Also, normalizing per channel helps to keep the gradient flow into each channel uniformly. Since we are going to use pre-trained models for this project, it makes sense to mean correct our images based on the channel wise mean pixel values before feeding them into the pre-trained networks. However, its not mandatory to mean correct the images based on the mean values of ImageNet on which the pre-trained network is based on. You can very well normalize by the mean pixel intensities of the training corpus for the project.
Similarly, instead of performing channel wise mean normalization, you can choose to do a mean normalization over the entire image. This entails subtracting the mean value of each image from itself. Imagine a scenario where the object to be recognized by the CNN is exposed under different lighting conditions such as in the day and night. We would like to classify the object correctly irrespective of the light conditions, however different pixel intensities would activate the neurons of the neural network differently, leading to a possibility of wrong classification of the object. However, if we subtract the mean of each image from itself, the object would no longer be affected by the different lighting conditions. So depending on the nature of the images that we work with, we can choose for ourselves the best image normalizing scheme. However any of the default ways of normalization tend to give reasonable performance.