aoxo commited on
Commit
f3483e8
·
verified ·
1 Parent(s): 12546e5

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +4 -0
README.md CHANGED
@@ -18,6 +18,10 @@ This repository contains the **Swaram (mal)** text-to-speech (TTS) model checkpo
18
 
19
  Swaram's text encoder is built on top of the **Wav2Vec2 decoder**. A **VAE** is used as the decoder. A **flow-based module** predicts **spectrogram-based acoustic features**, which is composed of the **Transformer-based Contextualizer** and cascaded dense layers. The spectrogram is then transformed into a speech waveform using a stack of **transposed convolutional layers**. To capture the one-to-many nature of TTS, where the same text can be spoken in multiple ways, the model also includes a stochastic duration predictor, allowing for varied speech rhythms from the same text input.
20
 
 
 
 
 
21
  ## Usage
22
 
23
  ```
 
18
 
19
  Swaram's text encoder is built on top of the **Wav2Vec2 decoder**. A **VAE** is used as the decoder. A **flow-based module** predicts **spectrogram-based acoustic features**, which is composed of the **Transformer-based Contextualizer** and cascaded dense layers. The spectrogram is then transformed into a speech waveform using a stack of **transposed convolutional layers**. To capture the one-to-many nature of TTS, where the same text can be spoken in multiple ways, the model also includes a stochastic duration predictor, allowing for varied speech rhythms from the same text input.
20
 
21
+ ## Architecture
22
+
23
+ ![architecture](architecture.png)
24
+
25
  ## Usage
26
 
27
  ```