pixel-tiny-bigrams

This model is a fine-tuned version of on the wikipedia + bookcorpus dataset. It achieves the following results on the evaluation set:

  • Loss: 0.3357

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0006
  • train_batch_size: 128
  • eval_batch_size: 8
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 8
  • total_train_batch_size: 1024
  • total_eval_batch_size: 64
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.05
  • training_steps: 250000

Training results

Training Loss Epoch Step Validation Loss
0.689 0.04 1000 0.6793
0.6802 0.09 2000 0.6787
0.6795 0.13 3000 0.6788
0.679 0.18 4000 0.6782
0.6787 0.22 5000 0.6782
0.6786 0.27 6000 0.6781
0.6784 0.31 7000 0.6781
0.6783 0.36 8000 0.6781
0.6781 0.4 9000 0.6773
0.6775 0.45 10000 0.6778
0.6775 0.49 11000 0.6769
0.6773 0.54 12000 0.6773
0.6774 0.58 13000 0.6771
0.6773 0.62 14000 0.6772
0.6773 0.67 15000 0.6772
0.6772 0.71 16000 0.6776
0.6773 0.76 17000 0.6770
0.6772 0.8 18000 0.6775
0.6772 0.85 19000 0.6770
0.6774 0.89 20000 0.6770
0.6772 0.94 21000 0.6762
0.6773 0.98 22000 0.6775
0.6773 1.03 23000 0.6764
0.6772 1.07 24000 0.6768
0.6772 1.12 25000 0.6769
0.6772 1.16 26000 0.6775
0.6772 1.2 27000 0.6776
0.6772 1.25 28000 0.6772
0.6772 1.29 29000 0.6769
0.6773 1.34 30000 0.6772
0.6772 1.38 31000 0.6777
0.6772 1.43 32000 0.6769
0.6773 1.47 33000 0.6767
0.677 1.52 34000 0.6766
0.6765 1.56 35000 0.6766
0.6763 1.61 36000 0.6766
0.6764 1.65 37000 0.6758
0.6764 1.7 38000 0.6762
0.6758 1.74 39000 0.6771
0.6772 1.78 40000 0.6770
0.6575 1.83 41000 0.6465
0.6373 1.87 42000 0.6318
0.6257 1.92 43000 0.6184
0.621 1.96 44000 0.6136
0.6183 2.01 45000 0.6127
0.6165 2.05 46000 0.6103
0.612 2.1 47000 0.6013
0.6037 2.14 48000 0.5943
0.6 2.19 49000 0.5915
0.5973 2.23 50000 0.5881
0.5924 2.28 51000 0.5799
0.5817 2.32 52000 0.5670
0.5719 2.36 53000 0.5557
0.5651 2.41 54000 0.5477
0.5592 2.45 55000 0.5408
0.5534 2.5 56000 0.5362
0.5446 2.54 57000 0.5251
0.5342 2.59 58000 0.5130
0.5239 2.63 59000 0.5024
0.5147 2.68 60000 0.4947
0.5061 2.72 61000 0.4848
0.4981 2.77 62000 0.4746
0.4912 2.81 63000 0.4681
0.4847 2.86 64000 0.4599
0.4792 2.9 65000 0.4537
0.474 2.94 66000 0.4491
0.4688 2.99 67000 0.4437
0.464 3.03 68000 0.4392
0.4592 3.08 69000 0.4324
0.4547 3.12 70000 0.4284
0.4507 3.17 71000 0.4260
0.4468 3.21 72000 0.4192
0.4432 3.26 73000 0.4161
0.44 3.3 74000 0.4153
0.4367 3.35 75000 0.4102
0.4337 3.39 76000 0.4062
0.4311 3.44 77000 0.4019
0.4286 3.48 78000 0.4007
0.4259 3.52 79000 0.3997
0.4239 3.57 80000 0.3968
0.4218 3.61 81000 0.3949
0.4201 3.66 82000 0.3935
0.4182 3.7 83000 0.3926
0.4168 3.75 84000 0.3879
0.4155 3.79 85000 0.3885
0.4136 3.84 86000 0.3844
0.4124 3.88 87000 0.3855
0.4116 3.93 88000 0.3830
0.4098 3.97 89000 0.3837
0.4087 4.01 90000 0.3802
0.4078 4.06 91000 0.3799
0.4068 4.1 92000 0.3794
0.4057 4.15 93000 0.3784
0.4047 4.19 94000 0.3788
0.4047 4.24 95000 0.3770
0.4029 4.28 96000 0.3750
0.4022 4.33 97000 0.3747
0.4015 4.37 98000 0.3736
0.4007 4.42 99000 0.3752
0.4 4.46 100000 0.3743
0.3995 4.51 101000 0.3741
0.3985 4.55 102000 0.3702
0.3981 4.59 103000 0.3800
0.3986 4.64 104000 0.3734
0.3966 4.68 105000 0.3705
0.3957 4.73 106000 0.3680
0.3957 4.77 107000 0.3663
0.3948 4.82 108000 0.3683
0.3943 4.86 109000 0.3697
0.3936 4.91 110000 0.3672
0.3932 4.95 111000 0.3649
0.3925 5.0 112000 0.3651
0.3919 5.04 113000 0.3650
0.3915 5.09 114000 0.3636
0.3911 5.13 115000 0.3655
0.3905 5.17 116000 0.3650
0.3905 5.22 117000 0.4054
0.3894 5.26 118000 0.3609
0.3889 5.31 119000 0.3599
0.3888 5.35 120000 0.3593
0.3887 5.4 121000 0.3601
0.3883 5.44 122000 0.3611
0.6776 5.49 123000 0.6769
0.3917 5.53 124000 0.3626
0.3897 5.58 125000 0.3617
0.3869 5.62 126000 0.3578
0.3864 5.67 127000 0.3578
0.3862 5.71 128000 0.3573
0.3855 5.75 129000 0.3578
0.3854 5.8 130000 0.3571
0.3849 5.84 131000 0.3566
0.3845 5.89 132000 0.3569
0.384 5.93 133000 0.3567
0.3921 5.98 134000 0.3628
0.3844 6.02 135000 0.3565
0.383 6.07 136000 0.3547
0.3828 6.11 137000 0.3586
0.3824 6.16 138000 0.3553
0.3825 6.2 139000 0.3549
0.3818 6.25 140000 0.3537
0.3815 6.29 141000 0.3550
0.3812 6.33 142000 0.3539
0.3806 6.38 143000 0.3535
0.3804 6.42 144000 0.3533
0.3799 6.47 145000 0.3539
0.3799 6.51 146000 0.3528
0.3794 6.56 147000 0.3519
0.3792 6.6 148000 0.3501
0.3791 6.65 149000 0.3513
0.3784 6.69 150000 0.3511
0.3833 6.74 151000 0.3518
0.3805 6.78 152000 0.3513
0.3785 6.83 153000 0.3522
0.3772 6.87 154000 0.3493
0.3772 6.91 155000 0.3503
0.3771 6.96 156000 0.3513
0.3769 7.0 157000 0.3505
0.3766 7.05 158000 0.3499
0.3762 7.09 159000 0.3490
0.376 7.14 160000 0.3465
0.3756 7.18 161000 0.3490
0.3753 7.23 162000 0.3483
0.3749 7.27 163000 0.3481
0.3747 7.32 164000 0.3470
0.375 7.36 165000 0.3476
0.3742 7.41 166000 0.3471
0.3741 7.45 167000 0.3462
0.3738 7.49 168000 0.3470
0.3735 7.54 169000 0.3462
0.3736 7.58 170000 0.3467
0.3731 7.63 171000 0.3457
0.3726 7.67 172000 0.3478
0.3725 7.72 173000 0.3447
0.3722 7.76 174000 0.3459
0.3723 7.81 175000 0.3462
0.3718 7.85 176000 0.3464
0.3716 7.9 177000 0.3453
0.3712 7.94 178000 0.3466
0.3712 7.99 179000 0.3456
0.3709 8.03 180000 0.3452
0.3709 8.07 181000 0.3427
0.3707 8.12 182000 0.3445
0.3703 8.16 183000 0.3452
0.3701 8.21 184000 0.3420
0.3699 8.25 185000 0.3429
0.3697 8.3 186000 0.3432
0.3696 8.34 187000 0.3425
0.3696 8.39 188000 0.3437
0.3694 8.43 189000 0.3425
0.369 8.48 190000 0.3429
0.369 8.52 191000 0.3415
0.3685 8.57 192000 0.3431
0.3684 8.61 193000 0.3415
0.3683 8.65 194000 0.3421
0.368 8.7 195000 0.3422
0.3719 8.74 196000 0.3433
0.3678 8.79 197000 0.3400
0.3675 8.83 198000 0.3420
0.3676 8.88 199000 0.3426
0.3674 8.92 200000 0.3396
0.3673 8.97 201000 0.3404
0.3671 9.01 202000 0.3397
0.3669 9.06 203000 0.3417
0.3669 9.1 204000 0.3413
0.3666 9.15 205000 0.3386
0.3666 9.19 206000 0.3414
0.3664 9.23 207000 0.3407
0.3662 9.28 208000 0.3401
0.3661 9.32 209000 0.3412
0.366 9.37 210000 0.3374
0.3659 9.41 211000 0.3400
0.3658 9.46 212000 0.3406
0.3658 9.5 213000 0.3383
0.3656 9.55 214000 0.3399
0.3655 9.59 215000 0.3385
0.3653 9.64 216000 0.3406
0.3652 9.68 217000 0.3388
0.3674 9.73 218000 0.3381
0.365 9.77 219000 0.3387
0.3648 9.81 220000 0.3374
0.3649 9.86 221000 0.3378
0.3649 9.9 222000 0.3379
0.3646 9.95 223000 0.3382
0.3647 9.99 224000 0.3377
0.3644 10.04 225000 0.3351
0.3644 10.08 226000 0.3374
0.3644 10.13 227000 0.3379
0.3651 10.17 228000 0.3365
0.3643 10.22 229000 0.3360
0.3642 10.26 230000 0.3371
0.364 10.31 231000 0.3380
0.364 10.35 232000 0.3375
0.364 10.39 233000 0.3386
0.3639 10.44 234000 0.3373
0.364 10.48 235000 0.3377
0.3636 10.53 236000 0.3384
0.3636 10.57 237000 0.3367
0.3638 10.62 238000 0.3374
0.3637 10.66 239000 0.3368
0.3635 10.71 240000 0.3352
0.3635 10.75 241000 0.3393
0.3634 10.8 242000 0.3344
0.3635 10.84 243000 0.3383
0.3633 10.89 244000 0.3362
0.3635 10.93 245000 0.3353
0.3634 10.97 246000 0.3357
0.3632 11.02 247000 0.3375
0.3633 11.06 248000 0.3395
0.3635 11.11 249000 0.3382
0.3634 11.15 250000 0.3380

Framework versions

  • Transformers 4.17.0
  • Pytorch 1.11.0
  • Datasets 2.1.1.dev0
  • Tokenizers 0.12.1
Downloads last month
26
Inference API
Unable to determine this model’s pipeline type. Check the docs .