Tuna-2 drops pretrained vision encoders and learns directly from pixels for image understanding and generation | arXiv News