Soundwave: Less is More for Speech-Text Alignment in LLMs
Paper
β’
2502.12900
β’
Published
β’
86
πββ¬ Github ο½ π Paperο½ πΌ Online Demo
Soundwave is a Speech-to-Text model that bridges the gap between speech and text. It is trained on just 10k hours of data and delivers exceptional performance in speech translation and AIR-Bench speech tasks.
Load the Soundwave model and run inference with your audio files as shown in the GitHub repository.
@article{zhang2025soundwave,
title={Soundwave: Less is More for Speech-Text Alignment in LLMs},
author={Zhang, Yuhao and Liu, Zhiheng and Bu, Fan and Zhang, Ruiyu and Wang, Benyou and Li, Haizhou},
journal={arXiv preprint arXiv:2502.12900},
year={2025}
}