Researchers teach AI to learn by doing with automatically made machine‑learning tasks
This paper describes a way to train AI systems by giving them many automatically created machine‑learning problems to solve. The authors bui
This paper describes a way to train AI systems by giving them many automatically created machine‑learning problems to solve. The authors build a pipeline that writes task descriptions, picks datasets, makes starter code, and then checks that each task actually runs. They use the resulting problem-solving examples to fine‑tune smaller models so the models get practice at the step‑by‑step work of real research instead of only reading final papers or code.
The pipeline works in stages. First it samples machine‑learning topics and asks a strong model to propose a task and a dataset. It checks whether the proposed dataset exists by searching the Hugging Face API. Next it generates configuration files and starter code for an execution environment called ML-Gym. If running the task hits errors, the system feeds the errors back into the generator and tries to debug automatically for a limited number of rounds. If the task still fails it is discarded. The authors run the validated tasks at scale to collect many agent trajectories—sequences of reasoning, code edits, and runs—that record the full iterative process.
To make training data, the team used a powerful model (referred to as GPT‑5) as a teacher to produce trajectories on the synthetic tasks. They then used these trajectories for supervised fine‑tuning of two student models (Qwen3‑4B and Qwen3‑8B). On the ML‑Gym benchmark, which contains 13 machine‑learning challenges, the fine‑tuned student models improved aggregate performance. The paper reports increases in the main aggregate metric (AUP) of about 9% for Qwen3‑4B and 12% for Qwen3‑8B compared with their baselines.