Update README.md
Browse files
README.md
CHANGED
@@ -18,6 +18,10 @@ This repository contains the **Swaram (mal)** text-to-speech (TTS) model checkpo
|
|
18 |
|
19 |
Swaram's text encoder is built on top of the **Wav2Vec2 decoder**. A **VAE** is used as the decoder. A **flow-based module** predicts **spectrogram-based acoustic features**, which is composed of the **Transformer-based Contextualizer** and cascaded dense layers. The spectrogram is then transformed into a speech waveform using a stack of **transposed convolutional layers**. To capture the one-to-many nature of TTS, where the same text can be spoken in multiple ways, the model also includes a stochastic duration predictor, allowing for varied speech rhythms from the same text input.
|
20 |
|
|
|
|
|
|
|
|
|
21 |
## Usage
|
22 |
|
23 |
```
|
|
|
18 |
|
19 |
Swaram's text encoder is built on top of the **Wav2Vec2 decoder**. A **VAE** is used as the decoder. A **flow-based module** predicts **spectrogram-based acoustic features**, which is composed of the **Transformer-based Contextualizer** and cascaded dense layers. The spectrogram is then transformed into a speech waveform using a stack of **transposed convolutional layers**. To capture the one-to-many nature of TTS, where the same text can be spoken in multiple ways, the model also includes a stochastic duration predictor, allowing for varied speech rhythms from the same text input.
|
20 |
|
21 |
+
## Architecture
|
22 |
+
|
23 |
+
![architecture](architecture.png)
|
24 |
+
|
25 |
## Usage
|
26 |
|
27 |
```
|