Update README.md
Browse files
README.md
CHANGED
@@ -24,12 +24,6 @@ datasets:
|
|
24 |
<img src="https://huggingface.co/datasets/huggingface/badges/raw/main/open-in-hf-spaces-sm.svg" alt="Open in HuggingFace"/>
|
25 |
</a>
|
26 |
|
27 |
-
* **Fine-tuning guide on Colab:**
|
28 |
-
|
29 |
-
<a target="_blank" href="https://colab.research.google.com/github/ylacombe/scripts_and_notebooks/blob/main/Finetuning_Parler_TTS_on_a_single_speaker_dataset.ipynb">
|
30 |
-
<img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/>
|
31 |
-
</a>
|
32 |
-
|
33 |
**Parler-TTS Large v1** is a 2.2B-parameters text-to-speech (TTS) model, trained on 45K hours of audio data, that can generate high-quality, natural sounding speech with features that can be controlled using a simple text prompt (e.g. gender, background noise, speaking rate, pitch and reverberation).
|
34 |
It is the second released model from the [Parler-TTS](https://github.com/huggingface/parler-tts) project, which aims to provide the community with TTS training resources and dataset pre-processing code.
|
35 |
|
@@ -38,6 +32,7 @@ It is the second released model from the [Parler-TTS](https://github.com/hugging
|
|
38 |
* [π² Using a random voice](#π²-random-voice)
|
39 |
* [π― Using a specific speaker](#π―-using-a-specific-speaker)
|
40 |
* [Motivation](#motivation)
|
|
|
41 |
|
42 |
## π οΈ Usage
|
43 |
|
@@ -105,6 +100,7 @@ sf.write("parler_tts_out.wav", audio_arr, model.config.sampling_rate)
|
|
105 |
```
|
106 |
|
107 |
**Tips**:
|
|
|
108 |
* Include the term "very clear audio" to generate the highest quality audio, and "very noisy audio" for high levels of background noise
|
109 |
* Punctuation can be used to control the prosody of the generations, e.g. use commas to add small breaks in speech
|
110 |
* The remaining speech features (gender, speaking rate, pitch and reverberation) can be controlled directly through the prompt
|
|
|
24 |
<img src="https://huggingface.co/datasets/huggingface/badges/raw/main/open-in-hf-spaces-sm.svg" alt="Open in HuggingFace"/>
|
25 |
</a>
|
26 |
|
|
|
|
|
|
|
|
|
|
|
|
|
27 |
**Parler-TTS Large v1** is a 2.2B-parameters text-to-speech (TTS) model, trained on 45K hours of audio data, that can generate high-quality, natural sounding speech with features that can be controlled using a simple text prompt (e.g. gender, background noise, speaking rate, pitch and reverberation).
|
28 |
It is the second released model from the [Parler-TTS](https://github.com/huggingface/parler-tts) project, which aims to provide the community with TTS training resources and dataset pre-processing code.
|
29 |
|
|
|
32 |
* [π² Using a random voice](#π²-random-voice)
|
33 |
* [π― Using a specific speaker](#π―-using-a-specific-speaker)
|
34 |
* [Motivation](#motivation)
|
35 |
+
* [Optimizing inference](https://github.com/huggingface/parler-tts/blob/main/INFERENCE.md)
|
36 |
|
37 |
## π οΈ Usage
|
38 |
|
|
|
100 |
```
|
101 |
|
102 |
**Tips**:
|
103 |
+
* We've set up an [inference guide](https://github.com/huggingface/parler-tts/blob/main/INFERENCE.md) to make generation faster. Think SDPA, torch.compile, batching and streaming!
|
104 |
* Include the term "very clear audio" to generate the highest quality audio, and "very noisy audio" for high levels of background noise
|
105 |
* Punctuation can be used to control the prosody of the generations, e.g. use commas to add small breaks in speech
|
106 |
* The remaining speech features (gender, speaking rate, pitch and reverberation) can be controlled directly through the prompt
|