Image Representation

Introduction by Zhiming:

Because we will take the CNN course from Stanford University, this reading group will only focus on the shallow image representation, not the deep learning part. The goal of this reading group is to understand the basic idea about Bag of Words, how to import the spatial information to BoW, some advanced encoding methods (VLAD, Improved Fisher Vector), and some SVM kernels.

Required reading:

Optional tutorials:

Take-home message:

To accurately represent and  classify images, the following pipeline has been established in years of research, where each layer improves the overall performance:

  1. Extract features: HoG, SIFT, Bag of Words, (Soft) K-Means, VLAD, EM, Fisher Vector…
  2. Apply kernel SVM: linear or not, projections to other spaces, sqrt, etc. (improves both performance and speed)
  3. Metric learning: projection metric trained to extract “good features”
  4. Deep learning: replaces the previous ones, everything is learned (needs lots of data)

Further reading:

Leave a Reply

Your email address will not be published. Required fields are marked *