scenario-NON-KD-PR-COPY-D2_data-AmazonScience_massive_all_1_166sss
This model is a fine-tuned version of microsoft/mdeberta-v3-base on the massive dataset. It achieves the following results on the evaluation set:
- Loss: 1.3934
- Accuracy: 0.8438
- F1: 0.8185
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 5e-05
- train_batch_size: 32
- eval_batch_size: 32
- seed: 66
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- num_epochs: 10
Training results
Training Loss | Epoch | Step | Validation Loss | Accuracy | F1 |
---|---|---|---|---|---|
1.4661 | 0.2672 | 5000 | 1.3764 | 0.6250 | 0.4910 |
0.9918 | 0.5344 | 10000 | 1.0170 | 0.7378 | 0.6346 |
0.7796 | 0.8017 | 15000 | 0.8450 | 0.7801 | 0.7141 |
0.6113 | 1.0689 | 20000 | 0.8091 | 0.8038 | 0.7581 |
0.5684 | 1.3361 | 25000 | 0.7564 | 0.8141 | 0.7690 |
0.5164 | 1.6033 | 30000 | 0.7448 | 0.8215 | 0.7803 |
0.4654 | 1.8706 | 35000 | 0.7588 | 0.8296 | 0.7943 |
0.3718 | 2.1378 | 40000 | 0.7709 | 0.8308 | 0.7931 |
0.3378 | 2.4050 | 45000 | 0.7686 | 0.8321 | 0.7951 |
0.3404 | 2.6722 | 50000 | 0.7799 | 0.8324 | 0.7954 |
0.3294 | 2.9394 | 55000 | 0.7557 | 0.8363 | 0.8021 |
0.2571 | 3.2067 | 60000 | 0.8100 | 0.8371 | 0.8063 |
0.2545 | 3.4739 | 65000 | 0.8222 | 0.8358 | 0.8072 |
0.2571 | 3.7411 | 70000 | 0.8126 | 0.8403 | 0.8158 |
0.2324 | 4.0083 | 75000 | 0.8535 | 0.8387 | 0.8081 |
0.1938 | 4.2756 | 80000 | 0.8975 | 0.8368 | 0.8064 |
0.1853 | 4.5428 | 85000 | 0.8940 | 0.8406 | 0.8120 |
0.188 | 4.8100 | 90000 | 0.8870 | 0.8406 | 0.8132 |
0.1428 | 5.0772 | 95000 | 0.9963 | 0.8423 | 0.8213 |
0.1473 | 5.3444 | 100000 | 0.9991 | 0.8395 | 0.8145 |
0.1409 | 5.6117 | 105000 | 1.0564 | 0.8357 | 0.8080 |
0.1445 | 5.8789 | 110000 | 0.9895 | 0.8420 | 0.8134 |
0.1098 | 6.1461 | 115000 | 1.1040 | 0.8431 | 0.8150 |
0.1136 | 6.4133 | 120000 | 1.1074 | 0.8430 | 0.8195 |
0.1096 | 6.6806 | 125000 | 1.1357 | 0.8400 | 0.8122 |
0.1136 | 6.9478 | 130000 | 1.1148 | 0.8416 | 0.8191 |
0.0912 | 7.2150 | 135000 | 1.2180 | 0.8408 | 0.8133 |
0.0822 | 7.4822 | 140000 | 1.2177 | 0.8426 | 0.8176 |
0.088 | 7.7495 | 145000 | 1.2107 | 0.8420 | 0.8158 |
0.0777 | 8.0167 | 150000 | 1.2180 | 0.8444 | 0.8210 |
0.0667 | 8.2839 | 155000 | 1.3110 | 0.8394 | 0.8152 |
0.0637 | 8.5511 | 160000 | 1.3150 | 0.8439 | 0.8189 |
0.0649 | 8.8183 | 165000 | 1.3342 | 0.8417 | 0.8161 |
0.0463 | 9.0856 | 170000 | 1.3651 | 0.8432 | 0.8186 |
0.0496 | 9.3528 | 175000 | 1.3863 | 0.8424 | 0.8161 |
0.0585 | 9.6200 | 180000 | 1.3898 | 0.8433 | 0.8183 |
0.0527 | 9.8872 | 185000 | 1.3934 | 0.8438 | 0.8185 |
Framework versions
- Transformers 4.44.2
- Pytorch 2.1.1+cu121
- Datasets 2.14.5
- Tokenizers 0.19.1
- Downloads last month
- 1
Model tree for haryoaw/scenario-NON-KD-PR-COPY-D2_data-AmazonScience_massive_all_1_166sss
Base model
microsoft/mdeberta-v3-base