LIBERO-Safety: a benchmark to test physical and semantic risks for vision‑language‑action robots | arXiv News