Single‑loop Newton method reaches optimal second‑order solutions for a common bilevel problem
Bilevel optimization describes problems where one decision (the upper level) depends on the solution of another optimization problem (the lower level). This paper studies a standard but important case: the upper objective can be nonconvex while the lower objective is smooth and strongly convex in its own variable. The authors propose a new method that provably finds points that are stationary in a stronger, second‑order sense — meaning the gradient is small and the curvature does not show a clear direction of negative curvature — and they do so with an optimal dependence on the desired accuracy ε, namely O(ε−1.5).
The paper gives two algorithms. The first is a double‑loop baseline (DLCRN) that adapts the cubic regularized Newton method to the bilevel setting; it achieves the optimal outer‑loop rate but requires repeating inner solves of the lower‑level problem. The main contribution is a single‑loop cubic regularized Newton method (SLCRN). SLCRN pairs one gradient step on the lower level with one Newton step to build an approximate “hypergradient” (the gradient of the upper objective after eliminating the lower variable). The authors prove SLCRN attains a deterministic O(ε−1.5) total oracle complexity in ε, which matches the best known lower bound for finding second‑order stationary points.
Why this is nontrivial: to optimize the outer variable one needs the hypergradient ∇Φ(x) = ∇x f(x,y*(x)) − ∇2xy g(x,y*(x)) [∇2yy g(x,y*(x))]−1 ∇y f(x,y*(x)). Computing this exactly requires the exact lower‑level solution y*(x) and a Hessian inverse product, neither of which is practical. Existing practical approaches either approximate y*(x) with an inner iterative solver and then approximate the Hessian inverse (Approximate Implicit Differentiation), or use first‑order schemes. The SLCRN design sidesteps expensive inner solves by using a carefully chosen single gradient step on the lower level together with a Newton step for the hypergradient, and it adds a cubic regularizer (a standard trick in second‑order methods) to ensure deterministic escape from saddle points.