Signal ProcessingEnglishPublished

Study finds adaptive quantized control more robust than a trained RL controller when packets are lost and the plant changes

July 1, 2026arXiv: 2606.32003v1

This paper compares two ways to control a simple but realistic class of engineered systems. The systems are linear, possibly unstable, and subject to two practical limits: the control signal is sent over a network with limited precision (quantization), and some control packets are lost in transmission. The authors test an adaptive, model-based controller called adaptive quantized control (AQC) against a deep reinforcement learning controller called Deep Deterministic Policy Gradient (DDPG). The main result is that DDPG can be faster and more damped inside the environment it was trained for, but AQC is consistently more robust when the model is uncertain, packets are lost, or the plant switches to a more unstable mode.

The study uses a discrete-time linear plant described by x(k+1) = A x(k) + η(k) B v(k). Here η(k) is a random binary variable that models packet reception (1) or loss (0). The control input v(k) is a quantized version of a nominal control law. The quantizer is logarithmic, meaning it uses coarse levels far from the origin and finer levels near zero. The authors also test a scenario where the plant’s dynamics switch during operation from an unstable nominal system to a more unstable one.

How the two controllers work is different. The adaptive quantized controller is model-based and adaptive. It assumes the input matrix B is known and that the pair (A,B) is stabilizable. It uses a time-varying quantizer model and acknowledgment messages from the plant that tell the controller whether a packet arrived. Those acknowledgments feed an update law for the controller gains. The AQC design is built around Lyapunov-based stability conditions, which give formal guarantees that the closed-loop state will converge under stated assumptions and packet-loss bounds. The DDPG controller is a data-driven actor–critic method that learns a deterministic policy and a value function using feed-forward neural networks. In this study the DDPG agent is trained on the nominal model and is not given acknowledgment messages about packet delivery.