Modeling and forecasting Mumbai’s monthly malaria counts: why counting and time patterns both matter
This paper tests ways to predict monthly malaria cases in the Mumbai region and finds that models which handle both extra variation and time dependence give better forecasts. The authors used routine surveillance counts of Plasmodium vivax from the Health Management Information System for 2012–2019 and a set of meteorological variables. Their main conclusion is that a count time‑series model called GLARMA (Generalized Linear Autoregressive Moving Average) with a negative binomial distribution produced more accurate and more stable forecasts than simpler approaches.
The researchers first fitted standard count regression models. Poisson regression suggested links between cases and environmental factors but failed a basic diagnostic: the data showed overdispersion, meaning the variance was larger than the mean. A negative binomial regression, which allows extra variance, fit the data better. That model also showed that seasonal patterns (month‑to‑month differences) were more strongly associated with malaria counts than any single climate variable.
Residual checks on those baseline models revealed serial dependence: recent malaria counts helped predict current counts in ways not captured by the regression terms. To deal with that, the team used the GLARMA framework. GLARMA keeps the familiar regression setup (counts linked to covariates) but adds a latent term that captures autocorrelation, built from past prediction errors. This lets the model reflect the fact that disease counts are influenced by recent transmission and other carry‑over effects.
For forecasting, the authors used simulation‑based methods and reported the median of simulated conditional means to reduce the risk of inflated forecasts. They evaluated forecasts with rolling time series cross‑validation across four forecast horizons and compared GLARMA to competing regression approaches and a Gaussian ARIMA benchmark. The GLARMA negative binomial model consistently showed better predictive performance and greater stability under this testing scheme.