MEMO: using a memory bank to make multi-turn LLM games more stable and stronger
Researchers introduced MEMO, a method that makes long, multi-turn games played by large language model (LLM) agents both stronger and more c
Researchers introduced MEMO, a method that makes long, multi-turn games played by large language model (LLM) agents both stronger and more consistent. These games suffer from big run-to-run swings: a small early mistake can change the whole interaction and make win rates unreliable. MEMO reduces that instability by changing what the models are given to think about at inference time, rather than changing the model weights.
At a high level, MEMO is a self-play framework that couples two ideas: retention and exploration. Retention means keeping a persistent memory bank that stores structured lessons extracted from past self-play games. The memory holds short summaries and actionable insights that can be injected back into future games as priors. Exploration means evolving a pool of prompts and contexts in tournament-style self-play. The system tests many candidate contexts, scores them by performance and uncertainty, and keeps the most reliable ones.
How it works in plain terms: MEMO proposes a set of candidate contexts (prompts plus memory priors), runs them against a baseline agent in many self-play games, and rates each candidate using TrueSkill. TrueSkill is a Bayesian rating system that gives both a skill estimate and an uncertainty; MEMO prefers contexts that are strong and have low uncertainty. The framework also uses “prioritized replay” to revisit rare or decisive game states, and basic create/read/update/delete operations to manage memory entries. The authors tested MEMO on five text-based games drawn from TextArena and SPIN-Bench, using GPT-4o-mini and Qwen-2.5-7B-Instruct models with 2,000 self-play games per task.