Improving Transformers with Dynamically Composable Multi-Head Attention Paper • 2405.08553 • Published May 14, 2024 • 1