Batch vs Mini-Batch vs Stochastic Gradient Descent

Most deep learning architectures use a variation of Gradient Descent Optimization algorithm to come up with the best set of parameters for the netwrork, given the loss function and the target variables.

The basic idea of Gradient Descent is to traverse the surface of the loss function in the direction of the negative gradient to reach the local minimum.

This short video, explains what the variations Batch Gradient Descent, Mini-Batch Gradient Descent and Stochastic Gradient Descent are, the main differences between them and when they are used.

Leave a Reply

Your email address will not be published. Required fields are marked *