lapp0's picture
End of training
5d01012 verified
|
raw
history blame
6.3 kB
metadata
base_model: gpt2
library_name: Distily
license: mit
tags:
  - generated_from_trainer
model-index:
  - name: distily_bench_obj_cross_v2.12_gpt2
    results: []

distily_bench_obj_cross_v2.12_gpt2

This student model is distilled from the teacher model gpt2 using the dataset (unspecified).

The Distily library was used for this distillation.

It achieves the following results on the evaluation set:

  • eval_enwikippl: 563.7175
  • eval_frwikippl: 1345.9713
  • eval_zhwikippl: 833.8156
  • eval_tinystoriesppl: 794.4041
  • eval_loss: 1.4516
  • eval_runtime: 12.5731
  • eval_samples_per_second: 47.721
  • eval_steps_per_second: 11.93

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • distillation_objective: DistillationObjective(logits_loss_component=LossComponent(label=logits, weight=1, loss_fn=kl, layer_mapper=None, projector=None), hs_loss_component=LossComponent(label=hs, weight=0, loss_fn=None, layer_mapper=None, projector=None), attn_loss_component=LossComponent(label=attn, weight=0, loss_fn=None, layer_mapper=None, projector=None))
  • train_embeddings: True
  • learning_rate: 0.0001
  • train_batch_size: 1
  • eval_batch_size: 4
  • seed: 42
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_ratio: 0.5
  • num_epochs: 1.0

Resource Usage

Peak GPU Memory: 3.9293 GB

Eval-Phase Metrics

step epoch enwikippl frwikippl loss runtime samples_per_second steps_per_second tinystoriesppl zhwikippl
teacher eval 270.2348 76.8142 671.1238 22.8030
0 0 147374.6094 4251118206976.0 19.8108 12.5362 47.862 11.965 74.6838 6171058503680.0
1500 0.0253 1032.8136 21459.9512 3.2195 12.5353 47.865 11.966 525.6059 390432.1875
3000 0.0505 729.0046 5114.0029 2.1945 12.5774 47.705 11.926 661.7311 17566.7754
4500 0.0758 376.6540 2462.6831 1.8598 12.48 48.077 12.019 320.3664 635.7976
6000 0.1010 479.0934 1338.6918 1.5893 12.4815 48.071 12.018 544.5354 378.7316
7500 0.1263 584.5861 1167.9351 1.5059 12.5218 47.916 11.979 802.0584 327.7192
9000 0.1515 563.7175 1345.9713 1.4516 12.5731 47.721 11.93 794.4041 833.8156
10500 0.1768 628.4745 969.5675 1.3354 12.5741 47.717 11.929 1020.3542 321.8916
12000 0.2020 539.9009 806.6226 1.2508 12.6027 47.609 11.902 839.1936 320.7986
13500 0.2273 731.5787 813.5260 1.2058 12.5275 47.895 11.974 1454.4408 249.3415
15000 0.2525 589.4062 829.6402 1.1651 12.5408 47.844 11.961 1055.8788 246.7935
16500 0.2778 516.2803 723.3314 1.1227 12.6335 47.493 11.873 901.8456 210.8349
18000 0.3030 504.6353 743.0820 1.1052 12.5796 47.696 11.924 890.8421 327.6317
19500 0.3283 573.1406 698.4509 1.1044 12.5126 47.952 11.988 1070.7333 267.1270
21000 0.3535 495.8198 711.6088 1.0507 12.5101 47.961 11.99 886.2881 210.5538
22500 0.3788 501.9647 659.8377 1.0060 12.5977 47.628 11.907 955.5714 225.8886
24000 0.4040 628.5231 696.8541 1.0003 12.5425 47.837 11.959 1388.9321 272.9261
25500 0.4293 491.1847 784.8514 0.9600 12.4842 48.061 12.015 954.0717 253.1456
27000 0.4545 413.3142 581.4585 0.9446 12.5295 47.887 11.972 757.5270 273.5640
28500 0.4798 491.1941 643.7033 0.9424 12.6552 47.411 11.853 994.0450 219.2680
30000 0.5051 444.4044 686.3331 0.9338 12.6988 47.249 11.812 862.3303 312.4154
31500 0.5303 508.9440 641.9151 0.9117 12.5748 47.714 11.929 1104.8569 261.0676
33000 0.5556 573.1851 588.2755 0.8677 12.7003 47.243 11.811 1374.1992 306.8396
34500 0.5808 436.7425 595.4240 0.8329 12.5456 47.825 11.956 926.4799 263.6575
36000 0.6061 430.2032 487.1232 0.8204 12.5922 47.649 11.912 907.4166 462.6598
37500 0.6313 433.6747 510.4085 0.8060 12.6333 47.494 11.873 948.2142 285.3423
39000 0.6566 425.2826 446.8272 0.7935 12.9122 46.468 11.617 915.1762 419.6178
40500 0.6818 433.5236 450.9529 0.7692 12.5718 47.726 11.931 968.3745 425.5650
42000 0.7071 422.4834 392.4355 0.6995 12.4907 48.036 12.009 986.9214 197.5471
43500 0.7323 382.6314 326.8395 0.6327 12.5524 47.8 11.95 900.5792 165.3984
45000 0.7576 379.0175 301.0615 0.6073 12.5527 47.799 11.95 902.4793 145.1005
46500 0.7828 373.4075 293.1317 0.5928 12.5641 47.755 11.939 885.6292 145.3717
48000 0.8081 368.3225 290.1638 0.5874 12.6164 47.557 11.889 876.1263 157.4645
49500 0.8333 369.1651 279.8968 0.5786 12.5106 47.959 11.99 887.6813 152.8492
51000 0.8586 364.6742 280.6271 0.5655 12.5057 47.978 11.995 881.6844 117.1422
52500 0.8838 356.1384 265.4679 0.5521 12.574 47.717 11.929 862.6510 129.9270
54000 0.9091 362.6741 264.8237 0.5466 12.5668 47.745 11.936 880.6281 119.0881
55500 0.9343 354.4664 261.9577 0.5430 12.5768 47.707 11.927 861.8669 112.1871
57000 0.9596 355.2361 260.7429 0.5403 12.5688 47.737 11.934 864.4357 111.7241
58500 0.9848 354.8235 259.3875 0.5396 12.5609 47.767 11.942 864.0784 110.2362
59400 1.0 355.2361 259.4972 0.5394 12.5676 47.742 11.935 865.5798 109.8253

Framework versions

  • Distily 0.2.0
  • Transformers 4.44.0
  • Pytorch 2.3.0
  • Datasets 2.21.0