|
--- |
|
base_model: gpt2 |
|
library_name: Distily |
|
license: mit |
|
tags: |
|
- generated_from_trainer |
|
model-index: |
|
- name: distily_bench_obj_cross_v2.12_gpt2 |
|
results: [] |
|
--- |
|
|
|
# distily_bench_obj_cross_v2.12_gpt2 |
|
|
|
This student model is distilled from the teacher model [gpt2](https://huggingface.co/gpt2) using the dataset (unspecified). |
|
|
|
The [Distily](https://github.com/lapp0/distily) library was used for this distillation. |
|
|
|
It achieves the following results on the evaluation set: |
|
- eval_enwikippl: 653.3577 |
|
- eval_frwikippl: 986.1998 |
|
- eval_zhwikippl: 379.8699 |
|
- eval_tinystoriesppl: 1082.1683 |
|
- eval_loss: 1.3023 |
|
- eval_runtime: 12.5969 |
|
- eval_samples_per_second: 47.631 |
|
- eval_steps_per_second: 11.908 |
|
|
|
<!-- This model card has been generated automatically according to the information the Trainer had access to. You |
|
should probably proofread and complete it, then remove this comment. |
|
|
|
## Model description |
|
|
|
More information needed |
|
|
|
## Intended uses & limitations |
|
|
|
More information needed |
|
|
|
## Training and evaluation data |
|
|
|
More information needed |
|
--> |
|
|
|
## Training procedure |
|
|
|
### Training hyperparameters |
|
|
|
The following hyperparameters were used during training: |
|
- distillation_objective: DistillationObjective(logits_loss_component=LossComponent(label=logits, weight=1, loss_fn=kl, layer_mapper=None, projector=None), hs_loss_component=LossComponent(label=hs, weight=0, loss_fn=None, layer_mapper=None, projector=None), attn_loss_component=LossComponent(label=attn, weight=0, loss_fn=None, layer_mapper=None, projector=None)) |
|
- train_embeddings: True |
|
- learning_rate: 0.0001 |
|
- train_batch_size: 1 |
|
- eval_batch_size: 4 |
|
- seed: 42 |
|
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 |
|
- lr_scheduler_type: linear |
|
- lr_scheduler_warmup_ratio: 0.1 |
|
- num_epochs: 1.0 |
|
|
|
### Resource Usage |
|
Peak GPU Memory: 3.9293 GB |
|
|
|
### Eval-Phase Metrics |
|
| step | epoch | enwikippl | frwikippl | loss | runtime | samples_per_second | steps_per_second | tinystoriesppl | zhwikippl | |
|
| --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | |
|
| **teacher eval** | | 270.2348 | 76.8142 | | | | | 671.1238 | 22.8030 | |
|
| 0 | 0 | 147374.6094 | 4251118206976.0 | 19.8108 | 12.5898 | 47.658 | 11.914 | 74.6838 | 6171058503680.0 | |
|
| 1500 | 0.0253 | 995.8284 | 4478.0557 | 2.2057 | 12.629 | 47.51 | 11.877 | 1054.7445 | 39317.4570 | |
|
| 3000 | 0.0505 | 759.2491 | 2876.1150 | 1.7221 | 12.6775 | 47.328 | 11.832 | 930.6636 | 1598.6740 | |
|
| 4500 | 0.0758 | 679.3580 | 1449.2272 | 1.5342 | 12.6534 | 47.418 | 11.855 | 954.7816 | 415.1080 | |
|
| 6000 | 0.1010 | 706.9536 | 1264.4604 | 1.4442 | 12.6336 | 47.492 | 11.873 | 1114.5806 | 874.3105 | |
|
| 7500 | 0.1263 | 581.0081 | 953.5186 | 1.3672 | 12.5682 | 47.74 | 11.935 | 860.4433 | 287.9040 | |
|
| 9000 | 0.1515 | 653.3577 | 986.1998 | 1.3023 | 12.5969 | 47.631 | 11.908 | 1082.1683 | 379.8699 | |
|
| 10500 | 0.1768 | 634.6018 | 878.6852 | 1.2366 | 12.5486 | 47.814 | 11.954 | 1111.3147 | 267.4301 | |
|
| 12000 | 0.2020 | 543.3941 | 782.5607 | 1.1708 | 12.6162 | 47.558 | 11.889 | 914.1931 | 280.9046 | |
|
| 13500 | 0.2273 | 621.1537 | 751.0798 | 1.1457 | 12.6507 | 47.428 | 11.857 | 1146.2101 | 287.0221 | |
|
| 15000 | 0.2525 | 576.3350 | 773.9283 | 1.1070 | 12.6882 | 47.288 | 11.822 | 1048.3120 | 244.8425 | |
|
| 16500 | 0.2778 | 524.7780 | 686.7684 | 1.0660 | 12.6142 | 47.565 | 11.891 | 963.1450 | 180.7172 | |
|
| 18000 | 0.3030 | 547.1536 | 748.9669 | 1.0617 | 12.6351 | 47.487 | 11.872 | 1048.8325 | 393.3814 | |
|
| 19500 | 0.3283 | 521.4248 | 608.5453 | 1.0117 | 12.6667 | 47.368 | 11.842 | 1005.0343 | 194.0343 | |
|
| 21000 | 0.3535 | 492.6230 | 757.1074 | 0.9890 | 12.6396 | 47.47 | 11.867 | 925.2551 | 316.0413 | |
|
| 22500 | 0.3788 | 508.8848 | 631.0673 | 0.9599 | 12.5581 | 47.778 | 11.944 | 1014.2992 | 269.3275 | |
|
| 24000 | 0.4040 | 448.4678 | 634.5434 | 0.9540 | 12.6193 | 47.546 | 11.887 | 838.1882 | 182.7780 | |
|
| 25500 | 0.4293 | 465.3311 | 685.5602 | 0.9076 | 12.6325 | 47.497 | 11.874 | 941.0688 | 236.3699 | |
|
| 27000 | 0.4545 | 455.5760 | 536.7122 | 0.8543 | 12.6616 | 47.387 | 11.847 | 944.9666 | 158.6557 | |
|
| 28500 | 0.4798 | 422.2133 | 444.7551 | 0.7497 | 12.7174 | 47.179 | 11.795 | 918.8527 | 161.5927 | |
|
| 30000 | 0.5051 | 404.8533 | 401.2530 | 0.7146 | 12.5557 | 47.787 | 11.947 | 903.7859 | 159.8987 | |
|
| 31500 | 0.5303 | 401.0141 | 391.1385 | 0.6968 | 12.5584 | 47.777 | 11.944 | 901.9575 | 144.2610 | |
|
| 33000 | 0.5556 | 414.6530 | 376.1317 | 0.6896 | 12.6093 | 47.584 | 11.896 | 957.7856 | 160.5613 | |
|
| 34500 | 0.5808 | 403.2803 | 388.9411 | 0.6821 | 12.5399 | 47.847 | 11.962 | 924.6055 | 165.9398 | |
|
| 36000 | 0.6061 | 394.4821 | 343.9616 | 0.6697 | 12.5519 | 47.801 | 11.95 | 889.5546 | 170.7110 | |
|
| 37500 | 0.6313 | 400.1528 | 363.8464 | 0.6703 | 12.5536 | 47.795 | 11.949 | 920.4871 | 147.2159 | |
|
| 39000 | 0.6566 | 391.2865 | 364.2054 | 0.6676 | 12.5746 | 47.715 | 11.929 | 891.6525 | 156.6264 | |
|
| 40500 | 0.6818 | 388.4776 | 368.1123 | 0.6612 | 12.5571 | 47.782 | 11.945 | 888.4889 | 139.5851 | |
|
| 42000 | 0.7071 | 400.2923 | 352.6450 | 0.6593 | 12.5709 | 47.729 | 11.932 | 929.3182 | 138.6479 | |
|
| 43500 | 0.7323 | 387.7111 | 360.0483 | 0.6497 | 12.6167 | 47.556 | 11.889 | 881.3199 | 138.9349 | |
|
| 45000 | 0.7576 | 380.8126 | 334.1832 | 0.6313 | 12.6877 | 47.29 | 11.822 | 876.7783 | 125.0634 | |
|
| 46500 | 0.7828 | 380.8054 | 327.5193 | 0.6242 | 12.5708 | 47.73 | 11.932 | 882.1217 | 129.8663 | |
|
| 48000 | 0.8081 | 377.8082 | 338.2561 | 0.6204 | 12.6081 | 47.589 | 11.897 | 877.0321 | 131.2159 | |
|
| 49500 | 0.8333 | 379.1130 | 327.4732 | 0.6185 | 12.5502 | 47.808 | 11.952 | 883.5084 | 123.8266 | |
|
| 51000 | 0.8586 | 377.6328 | 326.7014 | 0.6177 | 12.6001 | 47.619 | 11.905 | 880.3737 | 123.1512 | |
|
| 52500 | 0.8838 | 376.4498 | 325.6333 | 0.6136 | 12.7004 | 47.242 | 11.811 | 876.8870 | 121.4464 | |
|
| 54000 | 0.9091 | 377.0334 | 324.0776 | 0.6123 | 12.7392 | 47.099 | 11.775 | 879.5005 | 121.6815 | |
|
| 55500 | 0.9343 | 377.6328 | 325.2666 | 0.6112 | 12.661 | 47.39 | 11.847 | 881.6116 | 121.6897 | |
|
| 57000 | 0.9596 | 376.8437 | 323.6670 | 0.6106 | 12.6149 | 47.563 | 11.891 | 879.0644 | 121.3654 | |
|
| 58500 | 0.9848 | 376.7562 | 324.3744 | 0.6101 | 12.5659 | 47.748 | 11.937 | 879.3189 | 121.1148 | |
|
| 59400 | 1.0 | 376.9021 | 324.4201 | 0.6100 | 12.5762 | 47.709 | 11.927 | 880.1915 | 121.0986 | |
|
|
|
### Framework versions |
|
- Distily 0.2.0 |
|
- Transformers 4.44.0 |
|
- Pytorch 2.3.0 |
|
- Datasets 2.21.0 |
|
|