When to align representations and when to predict across data types: a phase diagram for multimodal learning | arXiv News