Classification of images using kernelized Support Vector Machines and Convolutional Neural Networks.
We explored different ways of classifying images with SVMs and performed empirical comparisons between them.
We used a partition of the CIFAR-10 data set, consisting of 10k 32x32 color images, each belonging to a different class: airplane, automobile, bird, cat, deer, dog, frog, horse, ship and truck.
In order to convert the image to 1D data to be used with SVMs, we tried the following approaches:
- Flattening the image. It preserves all the original data, but loses any sort of spatial information, which makes it not very suitable for this type of task.
- Using a kernel for the SVM that allows having matrices as input. In our case, we tried the simple matrix trace: $k(X,X’) = Tr(X^T X)$
- Using the image histogram as input. It also loses spatial information, but the input data is more meaningful than the raw pixel values.
- Using a pretrained deep neural network for feature extraction. In our case, we used VGG16.
We compared the simpler grayscale case with the original RGB data. We also tried different SVM kernels: linear, polynomial, Gaussian, Laplacian, logarithmic and the aforementioned matrix trace.
Conclusions
Using the CNNs for preprocessing proved to give the best accuracy (by a large margin) and it also trained the fastest! So there is no doubt about the winner.
We also found that inputting the flattened image gave a better performance than using the histogram. But the models relied mostly on the color values for deciding: e.g. images with a large amount of blue pixels were usually correclty classified as either planes or ships, with some confusion between them.
Make sure to read the whole report, available here for all the details about the methodology and results.