Whisper-large-v3-no-numbers

Model info

This is a version of openai/whisper-large-v3 model without number tokens (token ids corresponding to numbers are excluded). NO fine-tuning was used.

Phrases with spoken numbers will be transcribed with numbers as words. It can be useful for TTS data preparation.

Example: Instead of "25" this model will transcribe phrase as "twenty five".

Usage

transformers version 4.45.2

Model can be used as an original whisper:

>>> from transformers import WhisperProcessor, WhisperForConditionalGeneration
>>> import torchaudio

>>> # load audio
>>> wav, sr = torchaudio.load("audio.wav")
>>> # resample if necessary
>>> wav = torchaudio.functional.resample(wav, sr, 16000)

>>> # load model and processor
>>> processor = WhisperProcessor.from_pretrained("waveletdeboshir/whisper-large-v3-no-numbers")
>>> model = WhisperForConditionalGeneration.from_pretrained("waveletdeboshir/whisper-large-v3-no-numbers")

>>> input_features = processor(wav[0], sampling_rate=16000, return_tensors="pt").input_features 

>>> # generate token ids
>>> predicted_ids = model.generate(input_features)
>>> # decode token ids to text
>>> transcription = processor.batch_decode(predicted_ids, skip_special_tokens=False)
['<|startoftranscript|><|en|><|transcribe|><|notimestamps|> Twenty seven years. <|endoftext|>']

The context tokens can be removed from the start of the transcription by setting skip_special_tokens=True.

Downloads last month
9
Safetensors
Model size
1.54B params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for waveletdeboshir/whisper-large-v3-no-numbers

Finetuned
(365)
this model

Collection including waveletdeboshir/whisper-large-v3-no-numbers