SpeechT5 TTS Turkish
This model is a fine-tuned version of microsoft/speecht5_tts on the turkishvoicedataset dataset. It achieves the following results on the evaluation set:
- Loss: 0.3079
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 1e-05
- train_batch_size: 16
- eval_batch_size: 8
- seed: 42
- gradient_accumulation_steps: 2
- total_train_batch_size: 32
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- lr_scheduler_warmup_steps: 500
- training_steps: 6000
- mixed_precision_training: Native AMP
Training results
Training Loss | Epoch | Step | Validation Loss |
---|---|---|---|
0.4436 | 1.8484 | 1000 | 0.3752 |
0.3822 | 3.6969 | 2000 | 0.3403 |
0.3729 | 5.5453 | 3000 | 0.3233 |
0.3451 | 7.3937 | 4000 | 0.3153 |
0.3315 | 9.2421 | 5000 | 0.3099 |
0.3492 | 11.0906 | 6000 | 0.3079 |
Framework versions
- Transformers 4.45.0.dev0
- Pytorch 2.4.1+cu121
- Datasets 3.0.0
- Tokenizers 0.19.1
Usage
installs
!pip install datasets soundfile speechbrain
inference
from transformers import pipeline
from datasets import load_dataset
import soundfile as sf
import torch
from IPython.display import Audio
synthesiser = pipeline("text-to-speech", "umarigan/speecht5_tts_tr_v1.0")
embeddings_dataset = load_dataset("umarigan/turkish_voice_dataset_embedded", split="train")
speaker_embedding = torch.tensor(embeddings_dataset[736]["speaker_embeddings"]).unsqueeze(0)
# Synthesize speech using the embedding
speech = synthesiser("Bir berber bir berbere gel beraber bir berber kuralım demiş", forward_params={"speaker_embeddings": speaker_embedding})
# Save the generated audio to a file
sf.write("speech.wav", speech["audio"], samplerate=speech["sampling_rate"])
# Play the audio in the notebook
Audio("speech.wav")
- Downloads last month
- 28
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.
Model tree for umarigan/speecht5_tts_tr_v1.0
Base model
microsoft/speecht5_tts