MathematicsEnglishPublished

Learned early stopping cuts mixed‑integer solve time by over 60% while keeping near‑optimal guarantees

March 20, 2026arXiv: 2602.01476v1

Mixed‑integer optimization solvers often find very good solutions early but then spend most of their time proving that the solution is truly optimal. The authors train a neural network to tell when a solver’s current solution is already close enough to optimal and then use a statistical tool called conformal prediction to set a safe stopping rule. On five problem families from a distributional MIPLIB benchmark, their method cut solve time by more than 60% while guaranteeing solutions within 0.1% of optimal with 95% probability.

A typical solver for mixed‑integer programs (MIP) runs a branch‑and‑bound search that keeps an upper bound (the best feasible solution found) and a lower bound (a proof that no much better solution exists). In many cases the upper bound reaches the optimal value early, but the lower bound takes much longer to close. The paper shows an example where an optimal solution appears at about 57 seconds but the solver does not certify optimality until about 120 seconds. Standard termination rules are therefore often conservative.

To act on this gap the researchers train a recurrent neural network called an LSTM (long short‑term memory) to predict the true optimality gap from the solver’s state at a given time. The predicted gap says how far the current solution might be from the true optimum. They then apply conformal prediction, a calibration method that turns these predictions into a threshold with a formal, finite‑sample probabilistic guarantee. During solving, if the model says the gap is below that calibrated threshold, the solver stops and returns the current best feasible solution.

This approach differs from past machine‑learning work that tries to speed up parts of the solver (for example, better branching rules or heuristics). Instead it learns when it is safe to stop earlier. The experiments reported in the paper use five families of related problems and show large time savings while preserving solution quality at the stated risk level (0.1% suboptimal with 95% probability).