Quantitative FinanceEnglishPublished

Deep hedging for S&P 500 options learns to under-hedge Black–Scholes deltas, but it is fragile in some market regimes

May 24, 2026arXiv: 2605.21696v1

This paper asks what a learned deep-hedging strategy actually does when it is trained on real S&P 500 option trades, when it helps, and when it fails. The author trains reinforcement-learning agents on historical call-option episodes and compares their daily hedges to a daily-updated Black–Scholes delta hedge. The learning objective is a local downside-shortfall reward: the agent is penalized for negative daily hedging profit-and-loss (P&L) but not for equally large positive daily P&L. That makes the learned policy focused on reducing bad daily losses rather than matching classical replication targets.

In tests that walk forward year by year from 2015 to 2023, the agents typically learn a systematic ‘‘delta haircut’’ — that is, they short less of the underlying index than the Black–Scholes delta would prescribe. The paper links this correction to the common market pattern that index drops often come with higher implied volatility. Because implied volatility tends to rise when the index falls, call prices fall by less than a fixed-volatility model would predict. A smaller short position can therefore cut the frequency or size of adverse daily P&L when judged by the downside-focused reward.

The author uses a model-free reinforcement-learning algorithm (Twin Delayed Deep Deterministic policy, TD3) and a careful out-of-sample design: for each test year Y the agent is trained on earlier data, selected using year Y−1, and then evaluated on the held-out year Y. The agent and the Black–Scholes benchmark trade the same option episodes, rebalance on the same dates, and face the same transaction-cost-aware P&L accounting. This lets the study isolate what the learned policy is doing relative to the familiar daily Black–Scholes delta.

The learned underhedge often improves accumulated reward and reduces terminal downside variance and conditional value-at-risk (CVaR) compared with Black–Scholes. But the benefit is regime-dependent. In 2022 the method produced losses in particular adverse daily states, revealing that the same underhedge which helps in many periods can fail badly when market moves and volatility responses differ. In 2023 the paper finds another failure mode: when option P&L is dominated by spot moves and the volatility channel is unusually weak, underhedging can actually raise ordinary variance even if it had reduced downside dispersion in other years.