--- license: cc-by-4.0 datasets: - speechcolab/gigaspeech - parler-tts/mls_eng_10k - reach-vb/jenny_tts_dataset language: - en - hi base_model: - openai-community/gpt2 pipeline_tag: text-to-speech --- # Model Card for Model ID Indri is a series of audio models that can do TTS, ASR, and audio continuation. This is the smallest model in our series and supports TTS tasks in 2 languages: 1. English 2. Hindi ## Model Details ### Model Description `indri-0.1-125m-tts` is a novel, extremely small, and lightweight TTS model based on the transformer architecture. It models audio as tokens and can generate high-quality audio with consistent style cloning of the speaker. ### Key features 1. Based on GPT-2 architecture 2. Supports voice cloning with small prompts 3. Code mixing text input in 2 languages - English and Hindi ### Model Sources [optional] - **Repository:** [https://github.com/cmeraki/indri] - **Demo:** [https://www.indrivoice.ai/] ## Technical details Please read our blog [here]() for more technical details on how it was built. Here's a brief of how this model works: 1. Converts input text into tokens 2. Runs autoregressive decoding on GPT-2 based transformer model and generates audio tokens 3. Decodes audio tokens (from [Kyutaui/mimi](https://huggingface.co/kyutai/mimi)) to audio ## How to Get Started with the Model Use the code below to get started with the model. ## Training Details ### Training Data [More Information Needed] ### Training Procedure #### Preprocessing [optional] [More Information Needed] #### Training Hyperparameters - **Training regime:** [More Information Needed] #### Speeds, Sizes, Times [optional] [More Information Needed] ## Citation [optional] **BibTeX:** [More Information Needed]