MoritzLaurer's picture
MoritzLaurer HF staff
Update README.md
65d182d verified
metadata
library_name: transformers
license: apache-2.0
base_model: answerdotai/ModernBERT-large
tags:
  - generated_from_trainer
metrics:
  - accuracy
model-index:
  - name: ModernBERT-large-zeroshot-v2.0
    results: []

ModernBERT-base-zeroshot-v2.0

Model description

This model is answerdotai/ModernBERT-large fine-tuned on the same dataset mix as the zeroshot-v2.0 models in the Zeroshot Classifiers Collection.

General takeaways:

  • The model is very fast and memory efficient. It's multiple times faster and consumes multiple times less memory than DeBERTav3. The memory efficiency enables larger batch sizes. I got a ~2x speed increase by enabling bf16 (instead of fp16).
  • It performs slightly worse then DeBERTav3 on average on the tasks tested below.
  • I'm in the process of preparing a newer version trained on better synthetic data to make full use of the 8k context window and to update the training mix of the older zeroshot-v2.0 models.

Training results

Datasets Mean Mean w/o NLI mnli_m mnli_mm fevernli anli_r1 anli_r2 anli_r3 wanli lingnli wellformedquery rottentomatoes amazonpolarity imdb yelpreviews hatexplain massive banking77 emotiondair emocontext empathetic agnews yahootopics biasframes_sex biasframes_offensive biasframes_intent financialphrasebank appreviews hateoffensive trueteacher spam wikitoxic_toxicaggregated wikitoxic_obscene wikitoxic_identityhate wikitoxic_threat wikitoxic_insult manifesto capsotu
Accuracy 0.85 0.851 0.942 0.944 0.894 0.812 0.717 0.716 0.836 0.909 0.815 0.899 0.964 0.951 0.984 0.814 0.8 0.744 0.752 0.802 0.544 0.899 0.735 0.934 0.864 0.877 0.913 0.953 0.921 0.821 0.989 0.901 0.927 0.931 0.959 0.911 0.497 0.73
F1 macro 0.834 0.835 0.935 0.938 0.882 0.795 0.688 0.676 0.823 0.898 0.814 0.899 0.964 0.951 0.984 0.77 0.753 0.763 0.69 0.805 0.533 0.899 0.729 0.925 0.864 0.877 0.901 0.953 0.855 0.821 0.983 0.901 0.927 0.931 0.952 0.911 0.362 0.662
Inference text/sec (A100 40GB GPU, batch=32) 1116.0 1104.0 1039.0 1241.0 1138.0 1102.0 1124.0 1133.0 1251.0 1240.0 1263.0 1231.0 1054.0 559.0 795.0 1238.0 1312.0 1285.0 1273.0 1268.0 992.0 1222.0 894.0 1176.0 1194.0 1197.0 1206.0 1166.0 1227.0 541.0 1199.0 1045.0 1054.0 1020.0 1005.0 1063.0 1214.0 1220.0

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 9e-06
  • train_batch_size: 16
  • eval_batch_size: 32
  • seed: 42
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 32
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_ratio: 0.06
  • num_epochs: 2

Framework versions

  • Transformers 4.48.0.dev0
  • Pytorch 2.5.1+cu124
  • Datasets 3.2.0
  • Tokenizers 0.21.0