Making training targets match what a detector can actually see improves particle reconstruction models
This paper looks at a simple but important problem in machine learning for particle detectors. Modern ML methods try to reconstruct every particle in an event from low-level detector signals. But the answers used to train those models often come from idealized simulation and ignore the detector’s limited spatial resolution. That mismatch can create ambiguous training targets when two particles leave overlapping signals that the detector cannot tell apart.
The authors study this issue in the context of Particle Flow (PF) reconstruction. PF is a way to combine tracking and calorimeter measurements to make a list of particle candidates and their energies and directions. Using a GEANT4-based generic detector simulation called DICE, the team builds “SimShowers”: the calorimeter response associated with each particle that entered the calorimeter. They then run a hit-level merging algorithm that looks at the energy in each calorimeter cell and shares it among overlapping SimShowers. The algorithm computes a per-shower resolvability score and directional connection scores between showers to decide when to merge targets that are experimentally indistinguishable.
One important variant they introduce is a Particle-Flow-aware (PF-aware) merging. This version keeps the charged-particle constraints that PF needs, so merged targets remain consistent across tracker and calorimeter information. The detector model used in the study has realistic features: an electromagnetic and a hadronic calorimeter with cell sizes from a few to several square centimeters, and a 4 tesla magnetic field. Some simplifications are noted: no electronics readout or digitization is simulated, the tracker is kept inactive and tracks are represented by idealized helix extrapolations with a small momentum smearing, and a track-quality cut is applied to reject cases where that approximation breaks down.