--- base_model: gpt2 library_name: Distily license: mit tags: - generated_from_trainer model-index: - name: distily_bench_obj_cross_v2.12_gpt2 results: [] --- # distily_bench_obj_cross_v2.12_gpt2 This student model is distilled from the teacher model [gpt2](https://huggingface.co/gpt2) using the dataset (unspecified). The [Distily](https://github.com/lapp0/distily) library was used for this distillation. It achieves the following results on the evaluation set: - eval_enwikippl: 563.7175 - eval_frwikippl: 1345.9713 - eval_zhwikippl: 833.8156 - eval_tinystoriesppl: 794.4041 - eval_loss: 1.4516 - eval_runtime: 12.5731 - eval_samples_per_second: 47.721 - eval_steps_per_second: 11.93 ## Training procedure ### Training hyperparameters The following hyperparameters were used during training: - distillation_objective: DistillationObjective(logits_loss_component=LossComponent(label=logits, weight=1, loss_fn=kl, layer_mapper=None, projector=None), hs_loss_component=LossComponent(label=hs, weight=0, loss_fn=None, layer_mapper=None, projector=None), attn_loss_component=LossComponent(label=attn, weight=0, loss_fn=None, layer_mapper=None, projector=None)) - train_embeddings: True - learning_rate: 0.0001 - train_batch_size: 1 - eval_batch_size: 4 - seed: 42 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 - lr_scheduler_type: linear - lr_scheduler_warmup_ratio: 0.5 - num_epochs: 1.0 ### Resource Usage Peak GPU Memory: 3.9293 GB ### Eval-Phase Metrics | step | epoch | enwikippl | frwikippl | loss | runtime | samples_per_second | steps_per_second | tinystoriesppl | zhwikippl | | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | | **teacher eval** | | 270.2348 | 76.8142 | | | | | 671.1238 | 22.8030 | | 0 | 0 | 147374.6094 | 4251118206976.0 | 19.8108 | 12.5362 | 47.862 | 11.965 | 74.6838 | 6171058503680.0 | | 1500 | 0.0253 | 1032.8136 | 21459.9512 | 3.2195 | 12.5353 | 47.865 | 11.966 | 525.6059 | 390432.1875 | | 3000 | 0.0505 | 729.0046 | 5114.0029 | 2.1945 | 12.5774 | 47.705 | 11.926 | 661.7311 | 17566.7754 | | 4500 | 0.0758 | 376.6540 | 2462.6831 | 1.8598 | 12.48 | 48.077 | 12.019 | 320.3664 | 635.7976 | | 6000 | 0.1010 | 479.0934 | 1338.6918 | 1.5893 | 12.4815 | 48.071 | 12.018 | 544.5354 | 378.7316 | | 7500 | 0.1263 | 584.5861 | 1167.9351 | 1.5059 | 12.5218 | 47.916 | 11.979 | 802.0584 | 327.7192 | | 9000 | 0.1515 | 563.7175 | 1345.9713 | 1.4516 | 12.5731 | 47.721 | 11.93 | 794.4041 | 833.8156 | | 10500 | 0.1768 | 628.4745 | 969.5675 | 1.3354 | 12.5741 | 47.717 | 11.929 | 1020.3542 | 321.8916 | | 12000 | 0.2020 | 539.9009 | 806.6226 | 1.2508 | 12.6027 | 47.609 | 11.902 | 839.1936 | 320.7986 | | 13500 | 0.2273 | 731.5787 | 813.5260 | 1.2058 | 12.5275 | 47.895 | 11.974 | 1454.4408 | 249.3415 | | 15000 | 0.2525 | 589.4062 | 829.6402 | 1.1651 | 12.5408 | 47.844 | 11.961 | 1055.8788 | 246.7935 | | 16500 | 0.2778 | 516.2803 | 723.3314 | 1.1227 | 12.6335 | 47.493 | 11.873 | 901.8456 | 210.8349 | | 18000 | 0.3030 | 504.6353 | 743.0820 | 1.1052 | 12.5796 | 47.696 | 11.924 | 890.8421 | 327.6317 | | 19500 | 0.3283 | 573.1406 | 698.4509 | 1.1044 | 12.5126 | 47.952 | 11.988 | 1070.7333 | 267.1270 | | 21000 | 0.3535 | 495.8198 | 711.6088 | 1.0507 | 12.5101 | 47.961 | 11.99 | 886.2881 | 210.5538 | | 22500 | 0.3788 | 501.9647 | 659.8377 | 1.0060 | 12.5977 | 47.628 | 11.907 | 955.5714 | 225.8886 | | 24000 | 0.4040 | 628.5231 | 696.8541 | 1.0003 | 12.5425 | 47.837 | 11.959 | 1388.9321 | 272.9261 | | 25500 | 0.4293 | 491.1847 | 784.8514 | 0.9600 | 12.4842 | 48.061 | 12.015 | 954.0717 | 253.1456 | | 27000 | 0.4545 | 413.3142 | 581.4585 | 0.9446 | 12.5295 | 47.887 | 11.972 | 757.5270 | 273.5640 | | 28500 | 0.4798 | 491.1941 | 643.7033 | 0.9424 | 12.6552 | 47.411 | 11.853 | 994.0450 | 219.2680 | | 30000 | 0.5051 | 444.4044 | 686.3331 | 0.9338 | 12.6988 | 47.249 | 11.812 | 862.3303 | 312.4154 | | 31500 | 0.5303 | 508.9440 | 641.9151 | 0.9117 | 12.5748 | 47.714 | 11.929 | 1104.8569 | 261.0676 | | 33000 | 0.5556 | 573.1851 | 588.2755 | 0.8677 | 12.7003 | 47.243 | 11.811 | 1374.1992 | 306.8396 | | 34500 | 0.5808 | 436.7425 | 595.4240 | 0.8329 | 12.5456 | 47.825 | 11.956 | 926.4799 | 263.6575 | | 36000 | 0.6061 | 430.2032 | 487.1232 | 0.8204 | 12.5922 | 47.649 | 11.912 | 907.4166 | 462.6598 | | 37500 | 0.6313 | 433.6747 | 510.4085 | 0.8060 | 12.6333 | 47.494 | 11.873 | 948.2142 | 285.3423 | | 39000 | 0.6566 | 425.2826 | 446.8272 | 0.7935 | 12.9122 | 46.468 | 11.617 | 915.1762 | 419.6178 | | 40500 | 0.6818 | 433.5236 | 450.9529 | 0.7692 | 12.5718 | 47.726 | 11.931 | 968.3745 | 425.5650 | | 42000 | 0.7071 | 422.4834 | 392.4355 | 0.6995 | 12.4907 | 48.036 | 12.009 | 986.9214 | 197.5471 | | 43500 | 0.7323 | 382.6314 | 326.8395 | 0.6327 | 12.5524 | 47.8 | 11.95 | 900.5792 | 165.3984 | | 45000 | 0.7576 | 379.0175 | 301.0615 | 0.6073 | 12.5527 | 47.799 | 11.95 | 902.4793 | 145.1005 | | 46500 | 0.7828 | 373.4075 | 293.1317 | 0.5928 | 12.5641 | 47.755 | 11.939 | 885.6292 | 145.3717 | | 48000 | 0.8081 | 368.3225 | 290.1638 | 0.5874 | 12.6164 | 47.557 | 11.889 | 876.1263 | 157.4645 | | 49500 | 0.8333 | 369.1651 | 279.8968 | 0.5786 | 12.5106 | 47.959 | 11.99 | 887.6813 | 152.8492 | | 51000 | 0.8586 | 364.6742 | 280.6271 | 0.5655 | 12.5057 | 47.978 | 11.995 | 881.6844 | 117.1422 | | 52500 | 0.8838 | 356.1384 | 265.4679 | 0.5521 | 12.574 | 47.717 | 11.929 | 862.6510 | 129.9270 | | 54000 | 0.9091 | 362.6741 | 264.8237 | 0.5466 | 12.5668 | 47.745 | 11.936 | 880.6281 | 119.0881 | | 55500 | 0.9343 | 354.4664 | 261.9577 | 0.5430 | 12.5768 | 47.707 | 11.927 | 861.8669 | 112.1871 | | 57000 | 0.9596 | 355.2361 | 260.7429 | 0.5403 | 12.5688 | 47.737 | 11.934 | 864.4357 | 111.7241 | | 58500 | 0.9848 | 354.8235 | 259.3875 | 0.5396 | 12.5609 | 47.767 | 11.942 | 864.0784 | 110.2362 | | 59400 | 1.0 | 355.2361 | 259.4972 | 0.5394 | 12.5676 | 47.742 | 11.935 | 865.5798 | 109.8253 | ### Framework versions - Distily 0.2.0 - Transformers 4.44.0 - Pytorch 2.3.0 - Datasets 2.21.0