

In particular, this physical perspective grants us access to the principle of least action, which is an equivalent reformulation of Newtonian (and practically all of) physics from an intrinsic spacetime view, and which will turn out to play a prominent role in our study of accelerated methods in optimization.Īs a personal note, let me say that while this post is physics-based, I myself don’t actually have much background in physics. The contrast between gradient flow and accelerated gradient flow is conceptually reminiscent of the difference between Aristotelian and Newtonian physics.Īs we shall see, adopting this physical perspective yields powerful insights into the behavior of the dynamics in the world of optimization, just as how physics helps us understand (and master) the physical world around us. Yet, accelerated gradient flow achieves a faster convergence rate of $O(1/t^2)$ whenever $f$ is convex, compared to the $O(1/t)$ rate of gradient flow. Whereas gradient flow has a clear and simple interpretation as a greedy steepest descent flow,Īccelerated gradient flow is not so intuitive because it is oscillatory. These dynamics are interesting to contrast. There is the first-order gradient flow dynamics:Īnd there is also the second-order accelerated gradient flow dynamics: We have seen there are (at least) two kinds of dynamics to solve the optimization problem in continuous time. Recall that in the world of optimization we have a space $\X = \R^d$ and a convex objective function
