opt-babylm2-rewritten-clean-spacy-no-num-adj-earlystop-bpe_seed-42_1e-3

This model was trained from scratch on the kanishka/babylm2-rewritten-clean-spacy-no-num-adj dataset. It achieves the following results on the evaluation set:

  • Loss: 2.6943
  • Accuracy: 0.4781

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.001
  • train_batch_size: 32
  • eval_batch_size: 64
  • seed: 42
  • gradient_accumulation_steps: 8
  • total_train_batch_size: 256
  • optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_steps: 32000
  • num_epochs: 20.0
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss Accuracy
32.3624 1.0 2226 3.8114 0.3616
27.3451 2.0 4452 3.3046 0.4098
24.8714 3.0 6678 3.0916 0.4308
23.5806 4.0 8904 2.9866 0.4411
22.612 5.0 11130 2.9237 0.4477
22.1721 6.0 13356 2.8891 0.4514
21.8434 7.0 15582 2.8607 0.4547
21.5531 8.0 17808 2.8414 0.4568
21.4295 9.0 20034 2.8298 0.4583
21.2648 10.0 22260 2.8171 0.4595
21.1394 11.0 24486 2.8105 0.4605
21.0222 12.0 26712 2.8037 0.4610
20.8781 13.0 28938 2.8000 0.4616
20.9859 14.0 31164 2.7974 0.4623
20.8891 15.0 33390 2.7767 0.4643
20.506 16.0 35616 2.7514 0.4677
20.0708 17.0 37842 2.7303 0.4711
19.6306 18.0 40068 2.7094 0.4740
19.133 19.0 42294 2.6957 0.4767
18.553 19.9911 44500 2.6943 0.4781

Framework versions

  • Transformers 4.47.1
  • Pytorch 2.5.1+cu124
  • Datasets 3.1.0
  • Tokenizers 0.21.0
Downloads last month
2
Safetensors
Model size
97.8M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Dataset used to train kanishka/opt-babylm2-rewritten-clean-spacy-no-num-adj-earlystop-bpe_seed-42_1e-3

Evaluation results

  • Accuracy on kanishka/babylm2-rewritten-clean-spacy-no-num-adj
    self-reported
    0.478