Sagicc
/

whisper-small-sr-yodas-v2

Automatic Speech Recognition

Generated from Trainer

Inference Endpoints

Model card Files Files and versions Metrics Training metrics Community

whisper-small-sr-yodas-v2 / README.md

Sagicc's picture

Update README.md

cbe258b verified 9 months ago

|

history blame contribute delete

3.77 kB

	---
	language:
	- sr
	license: apache-2.0
	base_model: openai/whisper-small
	tags:
	- generated_from_trainer
	datasets:
	- espnet/yodas
	- google/fleurs
	- Sagicc/audio-lmb-ds
	- mozilla-foundation/common_voice_16_1
	metrics:
	- wer
	model-index:
	- name: Whisper Small Sr Yodas
	results:
	- task:
	name: Automatic Speech Recognition
	type: automatic-speech-recognition
	dataset:
	name: Common Voice 16_1
	type: mozilla-foundation/common_voice_16_1
	config: sr
	split: test
	args: sr
	metrics:
	- name: Wer
	type: wer
	value: 0.12195981670778992
	---

	<!-- This model card has been generated automatically according to the information the Trainer had access to. You
	should probably proofread and complete it, then remove this comment. -->

	# Whisper Small Sr Yodas

	This model is a fine-tuned version of [openai/whisper-small](https://huggingface.co/openai/whisper-small) on merged datasets Common Voice 16 + Fleurs + [Juzne vesti (South news)](http://hdl.handle.net/11356/1679) + [LBM](https://huggingface.co/datasets/Sagicc/audio-lmb-ds) + (Yodas)[https://huggingface.co/datasets/espnet/yodas] dataset and

	Rupnik, Peter and Ljubešić, Nikola, 2022,\
	ASR training dataset for Serbian JuzneVesti-SR v1.0, Slovenian language resource repository CLARIN.SI, ISSN 2820-4042,\
	http://hdl.handle.net/11356/1679.

	It achieves the following results on the evaluation set:
	- Loss: 0.3584
	- Wer Ortho: 0.2328
	- Wer: 0.1220

	## Model description

	Added new dataset Yodas as test and experiment to improve results.

	## Intended uses & limitations

	More information needed

	## Training and evaluation data

	More information needed

	## Training procedure

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 1e-05
	- train_batch_size: 16
	- eval_batch_size: 8
	- seed: 42
	- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
	- lr_scheduler_type: linear
	- lr_scheduler_warmup_steps: 50
	- num_epochs: 10
	- mixed_precision_training: Native AMP

	### Training results

	\| Training Loss \| Epoch \| Step \| Validation Loss \| Wer Ortho \| Wer \|
	\|:-------------:\|:-----:\|:-----:\|:---------------:\|:---------:\|:------:\|
	\| 0.6958 \| 0.49 \| 1000 \| 0.2114 \| 0.2528 \| 0.1563 \|
	\| 0.5941 \| 0.98 \| 2000 \| 0.1857 \| 0.2214 \| 0.1269 \|
	\| 0.3985 \| 1.46 \| 3000 \| 0.1729 \| 0.2106 \| 0.1167 \|
	\| 0.4187 \| 1.95 \| 4000 \| 0.1745 \| 0.2120 \| 0.1147 \|
	\| 0.3446 \| 2.44 \| 5000 \| 0.1770 \| 0.2074 \| 0.1139 \|
	\| 0.2992 \| 2.93 \| 6000 \| 0.1710 \| 0.2048 \| 0.1061 \|
	\| 0.2074 \| 3.42 \| 7000 \| 0.1887 \| 0.2090 \| 0.1123 \|
	\| 0.1958 \| 3.91 \| 8000 \| 0.1871 \| 0.2136 \| 0.1131 \|
	\| 0.1707 \| 4.39 \| 9000 \| 0.2069 \| 0.2230 \| 0.1126 \|
	\| 0.1403 \| 4.88 \| 10000 \| 0.2092 \| 0.2138 \| 0.1110 \|
	\| 0.0871 \| 5.37 \| 11000 \| 0.2345 \| 0.2216 \| 0.1161 \|
	\| 0.0856 \| 5.86 \| 12000 \| 0.2384 \| 0.2281 \| 0.1161 \|
	\| 0.0496 \| 6.35 \| 13000 \| 0.2657 \| 0.2327 \| 0.1211 \|
	\| 0.0542 \| 6.84 \| 14000 \| 0.2760 \| 0.2346 \| 0.1198 \|
	\| 0.0274 \| 7.32 \| 15000 \| 0.3024 \| 0.2304 \| 0.1218 \|
	\| 0.0281 \| 7.81 \| 16000 \| 0.3134 \| 0.2357 \| 0.1216 \|
	\| 0.0151 \| 8.3 \| 17000 \| 0.3328 \| 0.2276 \| 0.1188 \|
	\| 0.0165 \| 8.79 \| 18000 \| 0.3417 \| 0.2348 \| 0.1220 \|
	\| 0.0094 \| 9.28 \| 19000 \| 0.3545 \| 0.2318 \| 0.1221 \|
	\| 0.0125 \| 9.77 \| 20000 \| 0.3584 \| 0.2328 \| 0.1220 \|


	### Framework versions

	- Transformers 4.39.3
	- Pytorch 2.0.1+cu117
	- Datasets 2.18.0
	- Tokenizers 0.15.1