Choosing the regularization strength by minimizing an unbiased risk gives near‑optimal recovery for many linear inverse problems
This paper studies how to pick the tuning parameter in a large class of linear reconstruction methods for inverse problems. The inverse problem is to recover an unknown signal f from noisy measurements Y = T f + noise, where T is a smoothing linear operator. Because the inversion is unstable, one must regularize the solution. The authors look at regularized solutions that are formed by applying a data‑dependent smoothing rule (called an ordered filter) to the usual least‑squares formula. The key question is how to choose the regularization parameter α from the data.
The rule they study picks α by minimizing an unbiased estimator of the prediction error. This idea goes back to Mallows’ Cp and to Stein’s unbiased risk estimator. In the present setting the prediction error measures how well the reconstructed signal, after applying the forward operator T, matches the observations. The chosen parameter is often used in practice, but a general theoretical justification for many filter‑based methods was missing.
The main result is an oracle inequality. Roughly speaking this means the reconstruction built with the data‑chosen α performs (in the usual squared‑error sense for the signal itself) almost as well as the best choice of α that could be picked with knowledge of the truth. From that inequality the authors deduce that the empirical risk minimizer is order‑optimal in the minimax sense over common smoothness classes. In plain terms: under the paper’s assumptions, the data‑driven rule attains the best possible rate of error decay as the noise level goes to zero.
The analysis requires some standard technical conditions. The forward operator T is assumed injective, compact and of Hilbert–Schmidt type (so its singular values decay to zero). The family of reconstruction rules must form an ordered filter (a smooth, monotone way of trading bias and variance), and the unknown signal is assumed to have a certain smoothness relative to T (a source condition). The results are asymptotic as the noise level becomes small, and the authors assume the noise variance is known. They also present numerical simulations that support the theory in finite samples.