opt-babylm2-rewritten-clean-spacy-random_removal_numadj-earlystop-bpe_seed-42_1e-3

This model was trained from scratch on the kanishka/babylm2-rewritten-clean-spacy-random_removal_numadj dataset. It achieves the following results on the evaluation set:

  • Loss: 2.6927
  • Accuracy: 0.4781

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.001
  • train_batch_size: 32
  • eval_batch_size: 64
  • seed: 42
  • gradient_accumulation_steps: 8
  • total_train_batch_size: 256
  • optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_steps: 32000
  • num_epochs: 20.0
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss Accuracy
32.5094 0.9997 2243 3.8102 0.3615
27.4934 1.9997 4486 3.2932 0.4103
24.9364 2.9997 6729 3.0832 0.4316
23.6407 3.9997 8972 2.9806 0.4416
22.7053 4.9997 11215 2.9227 0.4477
22.2601 5.9997 13458 2.8859 0.4513
21.9123 6.9997 15701 2.8600 0.4546
21.6403 7.9997 17944 2.8425 0.4570
21.5087 8.9997 20187 2.8276 0.4585
21.3483 9.9997 22430 2.8189 0.4596
21.2068 10.9997 24673 2.8091 0.4604
21.0757 11.9997 26916 2.8028 0.4610
21.12 12.9997 29159 2.7997 0.4619
21.0442 13.9997 31402 2.7952 0.4622
20.9217 14.9997 33645 2.7750 0.4649
20.5419 15.9997 35888 2.7506 0.4683
20.1666 16.9997 38131 2.7245 0.4714
19.7172 17.9997 40374 2.7101 0.4740
19.1888 18.9997 42617 2.6955 0.4768
18.63 19.9997 44860 2.6927 0.4781

Framework versions

  • Transformers 4.47.1
  • Pytorch 2.5.1+cu124
  • Datasets 3.1.0
  • Tokenizers 0.21.0
Downloads last month
2
Safetensors
Model size
98.1M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Dataset used to train kanishka/opt-babylm2-rewritten-clean-spacy-random_removal_numadj-earlystop-bpe_seed-42_1e-3

Evaluation results

  • Accuracy on kanishka/babylm2-rewritten-clean-spacy-random_removal_numadj
    self-reported
    0.478