Making transformers thinner in the middle can save compute and slightly improve language modeling | arXiv News