VITS TTS for Indian Languages
This repository contains a VITS-based Text-to-Speech (TTS) model fine-tuned for Indian languages. The model supports multiple Indian languages and a wide range of speaking styles and emotions, making it suitable for diverse use cases such as conversational AI, audiobooks, and more.
Model Overview
The model ai4bharat/vits_rasa_13
is based on the VITS architecture and supports the following features:
- Languages: Multiple Indian languages.
- Styles: Various speaking styles and emotions.
- Speaker IDs: Predefined speaker profiles for male and female voices.
Installation
pip install transformers torch
Usage
Here's a quick example to get started:
import soundfile as sf
from transformers import AutoModel, AutoTokenizer
model = AutoModel.from_pretrained("ai4bharat/vits_rasa_13", trust_remote_code=True).to("cuda")
tokenizer = AutoTokenizer.from_pretrained("ai4bharat/vits_rasa_13", trust_remote_code=True)
text = "ਕੀ ਮੈਂ ਇਸ ਹਫਤੇ ਦੇ ਅੰਤ ਵਿੱਚ ਰੁੱਝਿਆ ਹੋਇਆ ਹਾਂ?" # Example text in Punjabi
speaker_id = 16 # PAN_M
style_id = 0 # ALEXA
inputs = tokenizer(text=text, return_tensors="pt").to("cuda")
outputs = model(inputs['input_ids'], speaker_id=speaker_id, emotion_id=style_id)
sf.write("audio.wav", outputs.waveform.squeeze(), model.config.sampling_rate)
print(outputs.waveform.shape)
Supported Languages
Assamese
Bengali
Bodo
Dogri
Kannada
Maithili
Malayalam
Marathi
Nepali
Punjabi
Sanskrit
Tamil
Telugu
Speaker-Style Identifier Overview
Speaker Name | Speaker ID |
---|---|
ASM_F | 0 |
ASM_M | 1 |
BEN_F | 2 |
BEN_M | 3 |
BRX_F | 4 |
BRX_M | 5 |
DOI_F | 6 |
DOI_M | 7 |
KAN_F | 8 |
KAN_M | 9 |
MAI_M | 10 |
MAL_F | 11 |
MAR_F | 12 |
MAR_M | 13 |
NEP_F | 14 |
PAN_F | 15 |
PAN_M | 16 |
SAN_M | 17 |
TAM_F | 18 |
TEL_F | 19 |
Style Name | Style ID |
---|---|
ALEXA | 0 |
ANGER | 1 |
BB | 2 |
BOOK | 3 |
CONV | 4 |
DIGI | 5 |
DISGUST | 6 |
FEAR | 7 |
HAPPY | 8 |
NEWS | 10 |
SAD | 12 |
SURPRISE | 14 |
UMANG | 15 |
WIKI | 16 |
Citation
If you use this model in your research, please cite:
@article{ai4bharat_vits_rasa_13,
title={VITS TTS for Indian Languages},
author={Ashwin Sankar},
year={2024},
publisher={Hugging Face}
}
- Downloads last month
- 72
Inference API (serverless) does not yet support model repos that contain custom code.