speaker-segmentation-emotion

This model is a fine-tuned version of pyannote/segmentation-3.0 on the None dataset. It achieves the following results on the evaluation set:

  • Loss: 0.4349
  • Model Preparation Time: 0.0045
  • Der: 0.2041
  • False Alarm: 0.0709
  • Missed Detection: 0.1190
  • Confusion: 0.0141

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.001
  • train_batch_size: 128
  • eval_batch_size: 128
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 4
  • total_train_batch_size: 512
  • total_eval_batch_size: 512
  • optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine_with_min_lr
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 30.0

Training results

Training Loss Epoch Step Validation Loss Model Preparation Time Der False Alarm Missed Detection Confusion
0.5336 1.0 2616 0.5341 0.0045 0.2408 0.0777 0.1372 0.0259
0.5071 2.0 5232 0.5025 0.0045 0.2301 0.0716 0.1374 0.0211
0.497 3.0 7848 0.4931 0.0045 0.2269 0.0679 0.1384 0.0206
0.4837 4.0 10464 0.4856 0.0045 0.2247 0.0637 0.1412 0.0197
0.471 5.0 13080 0.4726 0.0045 0.2198 0.0648 0.1373 0.0177
0.4622 6.0 15696 0.4641 0.0045 0.2172 0.0668 0.1347 0.0157
0.459 7.0 18312 0.4601 0.0045 0.2147 0.0653 0.1331 0.0163
0.4574 8.0 20928 0.4571 0.0045 0.2145 0.0645 0.1338 0.0162
0.4672 9.0 23544 0.4613 0.0045 0.2147 0.0687 0.1290 0.0170
0.4564 10.0 26160 0.4630 0.0045 0.2165 0.0684 0.1306 0.0174
0.4554 11.0 28776 0.4574 0.0045 0.2140 0.0676 0.1292 0.0172
0.4602 12.0 31392 0.4638 0.0045 0.2159 0.0698 0.1276 0.0185
0.4489 13.0 34008 0.4539 0.0045 0.2124 0.0666 0.1293 0.0165
0.4507 14.0 36624 0.4554 0.0045 0.2132 0.0695 0.1267 0.0170
0.4503 15.0 39240 0.4536 0.0045 0.2119 0.0765 0.1182 0.0172
0.4471 16.0 41856 0.4472 0.0045 0.2099 0.0698 0.1242 0.0159
0.444 17.0 44472 0.4484 0.0045 0.2095 0.0682 0.1256 0.0157
0.4423 18.0 47088 0.4413 0.0045 0.2074 0.0667 0.1262 0.0145
0.4327 19.0 49704 0.4395 0.0045 0.2061 0.0690 0.1229 0.0142
0.4357 20.0 52320 0.4375 0.0045 0.2056 0.0689 0.1226 0.0141
0.4247 21.0 54936 0.4344 0.0045 0.2039 0.0702 0.1202 0.0134
0.4334 22.0 57552 0.4378 0.0045 0.2055 0.0700 0.1208 0.0147
0.4369 23.0 60168 0.4399 0.0045 0.2061 0.0702 0.1203 0.0157
0.4341 24.0 62784 0.4376 0.0045 0.2054 0.0709 0.1197 0.0148
0.428 25.0 65400 0.4366 0.0045 0.2046 0.0693 0.1211 0.0141
0.4324 26.0 68016 0.4360 0.0045 0.2047 0.0688 0.1216 0.0142
0.4319 27.0 70632 0.4358 0.0045 0.2045 0.0718 0.1183 0.0144
0.4317 28.0 73248 0.4351 0.0045 0.2042 0.0701 0.1200 0.0141
0.4303 29.0 75864 0.4348 0.0045 0.2041 0.0709 0.1191 0.0141
0.428 30.0 78480 0.4349 0.0045 0.2041 0.0709 0.1190 0.0141

Framework versions

  • Transformers 4.46.3
  • Pytorch 2.5.1+cu124
  • Datasets 3.1.0
  • Tokenizers 0.20.3
Downloads last month
89
Safetensors
Model size
1.47M params
Tensor type
F32
·
Inference API
Unable to determine this model’s pipeline type. Check the docs .

Model tree for cadazar/speaker-segmentation-emotion

Finetuned
(39)
this model