speaker-segmentation-emotion

This model is a fine-tuned version of pyannote/segmentation-3.0 on the None dataset. It achieves the following results on the evaluation set:

Loss: 0.4349
Model Preparation Time: 0.0045
Der: 0.2041
False Alarm: 0.0709
Missed Detection: 0.1190
Confusion: 0.0141

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.001
train_batch_size: 128
eval_batch_size: 128
seed: 42
distributed_type: multi-GPU
num_devices: 4
total_train_batch_size: 512
total_eval_batch_size: 512
optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine_with_min_lr
lr_scheduler_warmup_ratio: 0.1
num_epochs: 30.0

Training results

Training Loss	Epoch	Step	Validation Loss	Model Preparation Time	Der	False Alarm	Missed Detection	Confusion
0.5336	1.0	2616	0.5341	0.0045	0.2408	0.0777	0.1372	0.0259
0.5071	2.0	5232	0.5025	0.0045	0.2301	0.0716	0.1374	0.0211
0.497	3.0	7848	0.4931	0.0045	0.2269	0.0679	0.1384	0.0206
0.4837	4.0	10464	0.4856	0.0045	0.2247	0.0637	0.1412	0.0197
0.471	5.0	13080	0.4726	0.0045	0.2198	0.0648	0.1373	0.0177
0.4622	6.0	15696	0.4641	0.0045	0.2172	0.0668	0.1347	0.0157
0.459	7.0	18312	0.4601	0.0045	0.2147	0.0653	0.1331	0.0163
0.4574	8.0	20928	0.4571	0.0045	0.2145	0.0645	0.1338	0.0162
0.4672	9.0	23544	0.4613	0.0045	0.2147	0.0687	0.1290	0.0170
0.4564	10.0	26160	0.4630	0.0045	0.2165	0.0684	0.1306	0.0174
0.4554	11.0	28776	0.4574	0.0045	0.2140	0.0676	0.1292	0.0172
0.4602	12.0	31392	0.4638	0.0045	0.2159	0.0698	0.1276	0.0185
0.4489	13.0	34008	0.4539	0.0045	0.2124	0.0666	0.1293	0.0165
0.4507	14.0	36624	0.4554	0.0045	0.2132	0.0695	0.1267	0.0170
0.4503	15.0	39240	0.4536	0.0045	0.2119	0.0765	0.1182	0.0172
0.4471	16.0	41856	0.4472	0.0045	0.2099	0.0698	0.1242	0.0159
0.444	17.0	44472	0.4484	0.0045	0.2095	0.0682	0.1256	0.0157
0.4423	18.0	47088	0.4413	0.0045	0.2074	0.0667	0.1262	0.0145
0.4327	19.0	49704	0.4395	0.0045	0.2061	0.0690	0.1229	0.0142
0.4357	20.0	52320	0.4375	0.0045	0.2056	0.0689	0.1226	0.0141
0.4247	21.0	54936	0.4344	0.0045	0.2039	0.0702	0.1202	0.0134
0.4334	22.0	57552	0.4378	0.0045	0.2055	0.0700	0.1208	0.0147
0.4369	23.0	60168	0.4399	0.0045	0.2061	0.0702	0.1203	0.0157
0.4341	24.0	62784	0.4376	0.0045	0.2054	0.0709	0.1197	0.0148
0.428	25.0	65400	0.4366	0.0045	0.2046	0.0693	0.1211	0.0141
0.4324	26.0	68016	0.4360	0.0045	0.2047	0.0688	0.1216	0.0142
0.4319	27.0	70632	0.4358	0.0045	0.2045	0.0718	0.1183	0.0144
0.4317	28.0	73248	0.4351	0.0045	0.2042	0.0701	0.1200	0.0141
0.4303	29.0	75864	0.4348	0.0045	0.2041	0.0709	0.1191	0.0141
0.428	30.0	78480	0.4349	0.0045	0.2041	0.0709	0.1190	0.0141

Framework versions

Transformers 4.46.3
Pytorch 2.5.1+cu124
Datasets 3.1.0
Tokenizers 0.20.3

cadazar
/

speaker-segmentation-emotion

speaker-segmentation-emotion

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for cadazar/speaker-segmentation-emotion

Evaluation results