Zipper-LoRA
π Abstract
Speech Large Language Models (Speech-LLMs) have emerged as a powerful approach for automatic speech recognition (ASR) by aligning speech encoders with large language models. However, adapting these systems to multilingual settings with imbalanced data distributions remains challenging.
In such scenarios, a stability-plasticity dilemma often arises:
- Fully shared Parameter-Efficient Fine-Tuning (PEFT) can cause negative inter-lingual interference for under-represented languages
- Fully language-specific tuning limits the cross-lingual beneficial knowledge transfer needed for low-resource tasks
To address this, we propose Zipper-LoRA, a novel rank-level decoupling framework with three variants (Static, Hard, and Soft) that dynamically synthesizes LoRA updates from shared and language-specific subspaces.
Key Features
- Language-Conditioned Router: Dynamically controls the contribution of each subspace at the LoRA rank level
- Fine-grained Sharing: Enables sharing where languages are compatible, strict decoupling when conflicts occur
- Two-Stage Training: With Initial-B warm start for accelerated convergence
- Robust Performance: Works across both chunked and non-chunked encoder configurations
Results
Experiments on a 12-language mixed-resource setting show that Zipper-LoRA consistently outperforms both fully shared and independent baselines, particularly in extremely low-resource scenarios.
π TODO
- Paper
- Data
- Code (will be released after paper accepted)
- Model Weights (coming soon)
π Citation## π Citation
If you find this work helpful, please cite:
@article{ZipperLoRA2026,
title={Dynamic Parameter Decoupling for Speech-LLM based Multilingual Speech Recognition},
author={Mei, Yuxiang and Qiu, Delai and Liu, Shengping and Liang, Jiaen and Long, Yanhua},
journal={arXiv preprint arXiv:2603.17558},
year={2026}
}