One-step gradient delay need not stop large-scale asynchronous pipeline training of LLMs | arXiv News