metadata

license: cc-by-4.0
language:
  - as
  - bn
  - brx
  - doi
  - kn
  - mai
  - ml
  - mr
  - ne
  - pa
  - sa
  - ta
  - te
library_name: transformers
pipeline_tag: text-to-speech
tags:
  - text-to-speech

VITS TTS for Indian Languages

This repository contains a VITS-based Text-to-Speech (TTS) model fine-tuned for Indian languages. The model supports multiple Indian languages and a wide range of speaking styles and emotions, making it suitable for diverse use cases such as conversational AI, audiobooks, and more.

Model Overview

The model ai4bharat/vits_rasa_13 is based on the VITS architecture and supports the following features:

Languages: Multiple Indian languages.
Styles: Various speaking styles and emotions.
Speaker IDs: Predefined speaker profiles for male and female voices.

Installation

pip install transformers torch

Usage

Here's a quick example to get started:

import soundfile as sf
from transformers import AutoModel, AutoTokenizer

model = AutoModel.from_pretrained("ai4bharat/vits_rasa_13", trust_remote_code=True).to("cuda")
tokenizer = AutoTokenizer.from_pretrained("ai4bharat/vits_rasa_13", trust_remote_code=True)

text = "ਕੀ ਮੈਂ ਇਸ ਹਫਤੇ ਦੇ ਅੰਤ ਵਿੱਚ ਰੁੱਝਿਆ ਹੋਇਆ ਹਾਂ?"  # Example text in Punjabi
speaker_id = 16  # PAN_M
style_id = 0  # ALEXA

inputs = tokenizer(text=text, return_tensors="pt").to("cuda")
outputs = model(inputs['input_ids'], speaker_id=speaker_id, emotion_id=style_id)
sf.write("audio.wav", outputs.waveform.squeeze(), model.config.sampling_rate)
print(outputs.waveform.shape)

Supported Languages

Assamese
Bengali
Bodo
Dogri
Kannada
Maithili
Malayalam
Marathi
Nepali
Punjabi
Sanskrit
Tamil
Telugu

Speaker-Style Identifier Overview

Speaker Name	Speaker ID	Style Name	Style ID
ASM_F	0	ALEXA	0
ASM_M	1	ANGER	1
BEN_F	2	BB	2
BEN_M	3	BOOK	3
BRX_F	4	CONV	4
BRX_M	5	DIGI	5
DOI_F	6	DISGUST	6
DOI_M	7	FEAR	7
KAN_F	8	HAPPY	8
KAN_M	9	NEWS	10
MAI_M	10	SAD	12
MAL_F	11	SURPRISE	14
MAR_F	12	UMANG	15
MAR_M	13	WIKI	16
NEP_F	14
PAN_F	15
PAN_M	16
SAN_M	17
TAM_F	18
TEL_F	19

Citation

If you use this model in your research, please cite:

@article{ai4bharat_vits_rasa_13,
  title={VITS TTS for Indian Languages},
  author={Ashwin Sankar},
  year={2024},
  publisher={Hugging Face}
}