FreeSVC: Zero-shot Multilingual Singing Voice Conversion

FreeSVC is a state-of-the-art multilingual singing voice conversion model designed for zero-shot learning. It enables the conversion of singing voices across various languages without the need for extensive language-specific training. GitHub repository.

Supported Languages

Language ID Status Speech Data Singing Data
Chinese 0 βœ… Full 255h 70h
Dutch 1 βœ… Full Part of CML -
English 2 βœ… Full 921h 47h
French 3 βœ… Full Part of CML -
German 4 βœ… Full Part of CML -
Italian 5 βœ… Full Part of CML -
Japanese 6 βœ… Full 30h -
Other* 7 ⚠️ Partial - 10h
Polish 8 βœ… Full Part of CML -
Portuguese 9 βœ… Full Part of CML -
Spanish 10 βœ… Full Part of CML -

*Note: The "Other" category is used for vocal techniques without content.

Model Overview

FreeSVC leverages an enhanced VITS architecture integrated with Speaker-invariant Clustering (SPIN) and the ECAPA2 speaker encoder. This combination effectively separates speaker characteristics from linguistic content, ensuring high-quality and natural-sounding voice conversions across multiple languages.

Training Datasets

FreeSVC was trained on a diverse set of speech and singing datasets covering multiple languages:

Dataset Hours Language Type
AISHELL-1 170h Chinese Speech
AISHELL-3 85h Chinese Speech
CML-TTS 3.1k 7 Languages Speech
HiFiTTS 292h English Speech
JVS 30h Japanese Speech
LibriTTS-R 585h English Speech
NUS (NHSS) 7h English Speech, Singing
OpenSinger 50h Chinese Singing
Opencpop 5h Chinese Singing
PopBuTFy 10h, 40h Chinese, English Singing
POPCS 5h Chinese Singing
VCTK 44h English Speech
VocalSet 10h Other Singing

Citation

@misc{}
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Examples
Unable to determine this model's library. Check the docs .