Machine Learning and Particle Motion in Liquids: An Elegant Link

shutterstock_410264557.jpg

In this article, I argue, based on recent findings, that by thinking of the stochastic gradient descent algorithm (or the mini-batch gradient descent) as a Langevin stochastic process with an extra level of randomization (implemented via the learning rate), one can better understand the reasons why the stochastic gradient descent works so remarkably well as a global optimizer.