A simple offline fix for recommender alignment: weight examples by exp(reward/λ) | arXiv News