add evaluation

Browse files

Files changed (5) hide show

README.md +60 -49
evaluation_cv11_test.json +0 -0
evaluation_fleurs_test.json +0 -0
evaluation_whisper-large-v2_cv11_test.json +0 -0
evaluation_whisper-large-v2_fleurs_test.json +0 -0

README.md CHANGED Viewed

@@ -9,6 +9,7 @@ datasets:
 - mozilla-foundation/common_voice_11_0
 metrics:
 - wer
 model-index:
 - name: Whisper Large Spanish
   results:
@@ -19,71 +20,81 @@ model-index:
       name: mozilla-foundation/common_voice_11_0 es
       type: mozilla-foundation/common_voice_11_0
       config: es
-      split: validation[:1000]
       args: es
     metrics:
-    - name: Wer
       type: wer
-      value: 3.6508096148043854
 ---
-<!-- This model card has been generated automatically according to the information the Trainer had access to. You
-should probably proofread and complete it, then remove this comment. -->
-# Whisper Large Spanish
-This model is a fine-tuned version of [openai/whisper-large-v2](https://huggingface.co/openai/whisper-large-v2) on the mozilla-foundation/common_voice_11_0 es dataset.
-It achieves the following results on the evaluation set:
-- Loss: 0.1321
-- Wer: 3.6508
-- Cer: 1.0572
-## Model description
-More information needed
-## Intended uses & limitations
-More information needed
-## Training and evaluation data
-More information needed
-## Training procedure
-### Training hyperparameters
-The following hyperparameters were used during training:
-- learning_rate: 1e-06
-- train_batch_size: 16
-- eval_batch_size: 8
-- seed: 42
-- gradient_accumulation_steps: 2
-- total_train_batch_size: 32
-- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
-- lr_scheduler_type: linear
-- lr_scheduler_warmup_steps: 2000
-- training_steps: 20000
-- mixed_precision_training: Native AMP
-### Training results
-| Training Loss | Epoch | Step | Validation Loss | Wer    | Cer    |
-|:-------------:|:-----:|:----:|:---------------:|:------:|:------:|
-| 0.1837        | 0.32  | 1000 | 0.1669          | 4.2442 | 1.2488 |
-| 0.1343        | 0.64  | 2000 | 0.1444          | 4.0833 | 1.2084 |
-| 0.1312        | 0.96  | 3000 | 0.1362          | 3.9324 | 1.1933 |
-| 0.1206        | 1.28  | 4000 | 0.1333          | 3.8520 | 1.1748 |
-| 0.1143        | 1.6   | 5000 | 0.1321          | 3.6508 | 1.0572 |
-| 0.1202        | 1.92  | 6000 | 0.1291          | 3.8017 | 1.1311 |
-| 0.0856        | 2.24  | 7000 | 0.1325          | 3.7011 | 1.0841 |
-| 0.1005        | 2.56  | 8000 | 0.1320          | 3.7011 | 1.0555 |
-### Framework versions
-- Transformers 4.26.0.dev0
-- Pytorch 1.13.1+cu117
-- Datasets 2.7.1.dev0
-- Tokenizers 0.13.2

 - mozilla-foundation/common_voice_11_0
 metrics:
 - wer
+- cer
 model-index:
 - name: Whisper Large Spanish
   results:
       name: mozilla-foundation/common_voice_11_0 es
       type: mozilla-foundation/common_voice_11_0
       config: es
+      split: test
       args: es
     metrics:
+    - name: WER
       type: wer
+      value: 4.673613637544826
+    - name: CER
+      type: cer
+      value: 1.5573247819517182
+  - task:
+      name: Automatic Speech Recognition
+      type: automatic-speech-recognition
+    dataset:
+      name: google/fleurs es_419
+      type: google/fleurs
+      config: es_419
+      split: test
+      args: es_419
+    metrics:
+    - name: WER
+      type: wer
+      value: 5.396216546072705
+    - name: CER
+      type: cer
+      value: 3.450427960057061
 ---
+# Whisper Large Portuguese
+This model is a fine-tuned version of [openai/whisper-large-v2](https://huggingface.co/openai/whisper-large-v2) on Spanish using the train split of [Common Voice 11](https://huggingface.co/datasets/mozilla-foundation/common_voice_11_0). When using this model, make sure that your speech input is sampled at 16kHz.
+## Usage
+```python
+from transformers import pipeline
+transcriber = pipeline(
+  "automatic-speech-recognition",
+  model="jonatasgrosman/whisper-large-es-cv11"
+)
+transcriber.model.config.forced_decoder_ids = (
+  transcriber.tokenizer.get_decoder_prompt_ids(
+    language="es"
+    task="transcribe"
+  )
+)
+transcription = transcriber("path/to/my_audio.wav")
+```
+## Evaluation
+We perform evaluation of the model using the test split of two datasets, the [Common Voice 11](https://huggingface.co/datasets/mozilla-foundation/common_voice_11_0) (same dataset used for the fine-tuning) and the [Fleurs](https://huggingface.co/datasets/google/fleurs) (dataset not seen during the fine-tuning). As Whisper can transcribe casing and punctuation, I performed the model evaluation in 2 different scenarios, one using the raw text and the other using the normalized text (lowercase + removal of punctuations). Additionally, for the Fleurs dataset, I evaluated the model in a scenario where there are no transcriptions of numerical values since the way these values are described in this dataset is different from how they are described in the dataset used in fine-tuning (Common Voice), so it is expected that this difference in the way of describing numerical values will affect the performance of the model for this type of transcription in Fleurs.
+### Common Voice 11
+| | CER | WER |
+| --- | --- | --- |
+| [jonatasgrosman/whisper-large-es-cv11](https://huggingface.co/jonatasgrosman/whisper-large-es-cv11) | 2.43 | 8.85 |
+| [jonatasgrosman/whisper-large-es-cv11](https://huggingface.co/jonatasgrosman/whisper-large-es-cv11) + text normalization | 1.56 | 4.67 |
+| [openai/whisper-large-v2](https://huggingface.co/openai/whisper-large-v2) | 3.71 | 12.34 |
+| [openai/whisper-large-v2](https://huggingface.co/openai/whisper-large-v2) + text normalization | 2.45 | 6.30 |
+### Fleurs
+| | CER | WER |
+| --- | --- | --- |
+| [jonatasgrosman/whisper-large-es-cv11](https://huggingface.co/jonatasgrosman/whisper-large-es-cv11) | 3.06 | 9.11 |
+| [jonatasgrosman/whisper-large-es-cv11](https://huggingface.co/jonatasgrosman/whisper-large-es-cv11) + text normalization | 3.45 | 5.40 |
+| [jonatasgrosman/whisper-large-es-cv11](https://huggingface.co/jonatasgrosman/whisper-large-es-cv11) + keep only non-numeric samples | 1.83 | 7.57 |
+| [jonatasgrosman/whisper-large-es-cv11](https://huggingface.co/jonatasgrosman/whisper-large-es-cv11) + text normalization + keep only non-numeric samples | 2.36 | 4.14 |
+| [openai/whisper-large-v2](https://huggingface.co/openai/whisper-large-v2) | 2.30 | 8.50 |
+| [openai/whisper-large-v2](https://huggingface.co/openai/whisper-large-v2) + text normalization | 2.76 | 4.79 |
+| [openai/whisper-large-v2](https://huggingface.co/openai/whisper-large-v2) + keep only non-numeric samples | 1.93 | 7.33 |
+| [openai/whisper-large-v2](https://huggingface.co/openai/whisper-large-v2) + text normalization + keep only non-numeric samples | 2.50 | 4.28 |

evaluation_cv11_test.json ADDED Viewed

The diff for this file is too large to render. See raw diff

evaluation_fleurs_test.json ADDED Viewed

The diff for this file is too large to render. See raw diff

evaluation_whisper-large-v2_cv11_test.json ADDED Viewed

The diff for this file is too large to render. See raw diff

evaluation_whisper-large-v2_fleurs_test.json ADDED Viewed

The diff for this file is too large to render. See raw diff