Neural Network Optimization Techniques

Introduction by Mohammad:

For this reading group we’ll do an overview of some of the recently proposed techniques for optimizing neural network parameters.  SGD, Momentum, NAG and AdaGrad have been discussed in the Stanford CNN class.

In general what we want to know is:

  • Why do we need these optimization methods ?
  • What are each of them trying to solve?
  • How can we connect them together?
  • How can we know which one is useful for our models?
  • Do we have a winner among all these methods?

Required reading

Take-home message:

  • With adaptive methods, hyperparameter tuning is not as important;
  • Well tuned Stochastic Gradient Descent is hard to be significantly beat.

Leave a Reply

Your email address will not be published. Required fields are marked *