Fast-Slow Training: pairing prompts with model weights to make LLMs learn more quickly and forget less
This paper introduces Fast-Slow Training (FST), a way to make large language models adapt more quickly to tasks while keeping their general skills. The main idea is to split learning into two channels. “Slow” learning changes the model’s internal parameters. “Fast” learning changes the text context or prompts that the model sees. The fast channel can be updated cheaply and often. The slow channel only changes when needed. Together they let the model learn task details without permanently overwriting its general behavior.
The researchers implement the slow channel with reinforcement learning using verifiable rewards. That means the model parameters are updated when an automatic checker can tell whether an answer is correct, for tasks like math or code. The fast channel is a population of optimized prompts, updated by a method called GEPA. GEPA is a reflective, evolutionary procedure that proposes and mutates text prompts using critiques from a frozen reflection model. During training the system alternates: prompt candidates are evolved, and then the model parameters are updated while conditioning on those prompts. The prompt set is kept diverse as a Pareto frontier so different prompts can specialize to different problem types.
They tested FST on several reasoning tasks: code-output prediction (CodeIO), a math benchmark (Polaris), and a multi-hop fact verification task (HoVer-hard). Experiments used a Qwen3-8B model in most runs. Compared to training only the model parameters with reinforcement learning, FST reached the same reward with fewer rollouts: up to 3× fewer on CodeIO and HoVer-hard and 1.4× fewer on the math task. At later stages FST also reached higher peak performance. The paper reports that, for matched reward, models trained with FST stayed closer to the base model — up to 70% lower Kullback–Leibler (KL) divergence — a statistical measure of how much the model changed.