metadata

license: cc-by-4.0
datasets:
  - speechcolab/gigaspeech
  - parler-tts/mls_eng_10k
  - reach-vb/jenny_tts_dataset
language:
  - en
  - hi
base_model:
  - openai-community/gpt2
pipeline_tag: text-to-speech

Model Card for Model ID

Indri is a series of audio models that can do TTS, ASR, and audio continuation. This is the smallest model in our series and supports TTS tasks in 2 languages:

English
Hindi

Model Details

Model Description

indri-0.1-125m-tts is a novel, extremely small, and lightweight TTS model based on the transformer architecture. It models audio as tokens and can generate high-quality audio with consistent style cloning of the speaker.

Key features

Based on GPT-2 architecture
Supports voice cloning with small prompts
Code mixing text input in 2 languages - English and Hindi

Model Sources [optional]

Repository: [https://github.com/cmeraki/indri]
Demo: [https://www.indrivoice.ai/]

Technical details

Please read our blog here for more technical details on how it was built.

Here's a brief of how this model works:

Converts input text into tokens
Runs autoregressive decoding on GPT-2 based transformer model and generates audio tokens
Decodes audio tokens (from Kyutaui/mimi) to audio

How to Get Started with the Model

Use the code below to get started with the model.

Training Details

Training Data

[More Information Needed]

Training Procedure

Preprocessing [optional]

[More Information Needed]

Training Hyperparameters

Training regime: [More Information Needed]

Speeds, Sizes, Times [optional]

[More Information Needed]

Citation [optional]

BibTeX:

[More Information Needed]