indri-0.1-124m-tts / README.md
rom7's picture
Update README.md
cf11db4 verified
|
raw
history blame
2.5 kB
metadata
license: cc-by-4.0
datasets:
  - speechcolab/gigaspeech
  - parler-tts/mls_eng_10k
  - reach-vb/jenny_tts_dataset
language:
  - en
  - hi
base_model:
  - openai-community/gpt2
pipeline_tag: text-to-speech

Model Card for Model ID

Indri is a series of audio models that can do TTS, ASR, and audio continuation. This is the smallest model in our series and supports TTS tasks in 2 languages:

  1. English
  2. Hindi

Model Details

Model Description

indri-0.1-125m-tts is a novel, extremely small, and lightweight TTS model based on the transformer architecture. It models audio as tokens and can generate high-quality audio with consistent style cloning of the speaker.

Key features

  1. Based on GPT-2 architecture
  2. Supports voice cloning with small prompts
  3. Code mixing text input in 2 languages - English and Hindi

Model Sources [optional]

Technical details

Please read our blog here for more technical details on how it was built.

Here's a brief of how this model works:

  1. Converts input text into tokens
  2. Runs autoregressive decoding on GPT-2 based transformer model and generates audio tokens
  3. Decodes audio tokens (from Kyutaui/mimi) to audio

How to Get Started with the Model

Use the code below to get started with the model.

Training Details

Training Data

[More Information Needed]

Training Procedure

Preprocessing [optional]

[More Information Needed]

Training Hyperparameters

  • Training regime: [More Information Needed]

Speeds, Sizes, Times [optional]

[More Information Needed]

Citation [optional]

BibTeX:

[More Information Needed]