Model description

This model helps to classify speakers from the frequency domain representation of speech recordings, obtained via Fast Fourier Transform (FFT). The model is created by a 1D convolutional network with residual connections for audio classification.

This repo contains the model for the notebook Speaker Recognition.

Full credits go to Fadi Badine

Dataset Used

This model uses a speaker recognition dataset of Kaggle

Intended uses & limitations

This should be run with TensorFlow 2.3 or higher, or tf-nightly. Also, The noise samples in the dataset need to be resampled to a sampling rate of 16000 Hz before using for this model so, In order to do this, you will need to have installed ffmpg.

Training and evaluation data

During dataset preparation, the speech samples & background noise samples were sorted and categorized into 2 folders - audio & noise, and then noise samples were resampled to 16000Hz & then the background noise was added to the speech samples to augment the data. After that, the FFT of these samples was given to the model for the training & evaluation part.

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

name learning_rate decay beta_1 beta_2 epsilon amsgrad training_precision
Adam 0.0010000000474974513 0.0 0.8999999761581421 0.9990000128746033 1e-07 False float32

Model Plot

View Model Plot

Model Image

Model By : Kavya Bisht
Downloads last month
10
Inference API
Unable to determine this model’s pipeline type. Check the docs .

Space using keras-io/speaker-recognition 1