metadata
license: cc-by-4.0
datasets:
- speechcolab/gigaspeech
- parler-tts/mls_eng_10k
- reach-vb/jenny_tts_dataset
language:
- en
- hi
base_model:
- openai-community/gpt2
pipeline_tag: text-to-speech
Model Card for Model ID
Indri is a series of audio models that can do TTS, ASR, and audio continuation. This is the smallest model in our series and supports TTS tasks in 2 languages:
- English
- Hindi
Model Details
Model Description
indri-0.1-125m-tts
is a novel, extremely small, and lightweight TTS model based on the transformer architecture.
It models audio as tokens and can generate high-quality audio with consistent style cloning of the speaker.
Key features
- Based on GPT-2 architecture
- Supports voice cloning with small prompts
- Code mixing text input in 2 languages - English and Hindi
Model Sources [optional]
- Repository: [https://github.com/cmeraki/indri]
- Demo: [https://www.indrivoice.ai/]
Technical details
Please read our blog here for more technical details on how it was built.
Here's a brief of how this model works:
- Converts input text into tokens
- Runs autoregressive decoding on GPT-2 based transformer model and generates audio tokens
- Decodes audio tokens (from Kyutaui/mimi) to audio
How to Get Started with the Model
Use the code below to get started with the model.
Training Details
Training Data
[More Information Needed]
Training Procedure
Preprocessing [optional]
[More Information Needed]
Training Hyperparameters
- Training regime: [More Information Needed]
Speeds, Sizes, Times [optional]
[More Information Needed]
Citation [optional]
BibTeX:
[More Information Needed]