New method lets speech‑aware large language models handle multi‑speaker audio by conditioning the encoder on diarization | arXiv News