Study finds Transformers fail to predict catastrophic collapse in dynamical systems, while reservoir computing succeeds
This paper asks whether Transformer neural networks can act as reliable digital twins for physical systems that undergo sudden collapse when a control parameter crosses a critical value. The authors use the task of predicting a catastrophic collapse — the abrupt loss of normal oscillatory behavior after a parameter passes a bifurcation point — as a benchmark. They report that, across multiple models and settings, Transformers trained only on “safe” parameter regimes consistently fail to foresee collapse. By contrast, reservoir computing, a different machine‑learning approach, reliably predicts the transitions in their tests.
To test extrapolation, the researchers trained models on time series from a few safe parameter values (labelled p1, p2, p3 in their protocol) and then asked the models to forecast what would happen at an unseen parameter value p4 that lies past the critical point. The input to each model included both the state time series and a parameter channel concatenated together. They validated training with multi‑step forecasts at the training parameters before testing on the unseen collapsing regime.
The paper contrasts two kinds of model. Reservoir computing is a recurrent, dynamical‑system style network that can act like an intrinsic simulator of the target system; previous work links its success to a form of synchronization with the target dynamics. Transformers rely on self‑attention, a mechanism that treats sequence entries in a way that can be insensitive to detailed temporal ordering. The authors suggest that this permutation‑invariant attention may limit a Transformer’s ability to learn how a system’s temporal patterns change when a control parameter moves the system toward a collapse.
They applied the protocol to four representative systems of different complexity: a chaotic food‑chain model, a power‑system voltage model, the discrete Ikeda optical cavity map, and the Kuramoto–Sivashinsky partial differential equation (a spatio‑temporal chaotic system). Transformers could be trained to produce accurate multi‑step forecasts inside the safe regimes. But when tested on parameters that should produce transient chaos followed by decay, Transformers often continued to predict persistent oscillations like those seen during training, rather than the collapse. The authors tried multiple Transformer configurations and measures to reduce overfitting, such as widening the training parameter range or changing model size, without success. Reservoir computing models, in contrast, did predict the transitions reliably in these examples.