ModernBERT-base-zeroshot-v2.0

Model description

This model is answerdotai/ModernBERT-large fine-tuned on the same dataset mix as the zeroshot-v2.0 models in the Zeroshot Classifiers Collection.

General takeaways:

  • The model is very fast and memory efficient. It's multiple times faster and consumes multiple times less memory than DeBERTav3. The memory efficiency enables larger batch sizes. I got a ~2x speed increase by enabling bf16 (instead of fp16).
  • It performs slightly worse then DeBERTav3 on average on the tasks tested below.
  • I'm in the process of preparing a newer version trained on better synthetic data to make full use of the 8k context window and to update the training mix of the older zeroshot-v2.0 models.

Training results

Datasets Mean Mean w/o NLI mnli_m mnli_mm fevernli anli_r1 anli_r2 anli_r3 wanli lingnli wellformedquery rottentomatoes amazonpolarity imdb yelpreviews hatexplain massive banking77 emotiondair emocontext empathetic agnews yahootopics biasframes_sex biasframes_offensive biasframes_intent financialphrasebank appreviews hateoffensive trueteacher spam wikitoxic_toxicaggregated wikitoxic_obscene wikitoxic_identityhate wikitoxic_threat wikitoxic_insult manifesto capsotu
Accuracy 0.85 0.851 0.942 0.944 0.894 0.812 0.717 0.716 0.836 0.909 0.815 0.899 0.964 0.951 0.984 0.814 0.8 0.744 0.752 0.802 0.544 0.899 0.735 0.934 0.864 0.877 0.913 0.953 0.921 0.821 0.989 0.901 0.927 0.931 0.959 0.911 0.497 0.73
F1 macro 0.834 0.835 0.935 0.938 0.882 0.795 0.688 0.676 0.823 0.898 0.814 0.899 0.964 0.951 0.984 0.77 0.753 0.763 0.69 0.805 0.533 0.899 0.729 0.925 0.864 0.877 0.901 0.953 0.855 0.821 0.983 0.901 0.927 0.931 0.952 0.911 0.362 0.662
Inference text/sec (A100 40GB GPU, batch=32) 1116.0 1104.0 1039.0 1241.0 1138.0 1102.0 1124.0 1133.0 1251.0 1240.0 1263.0 1231.0 1054.0 559.0 795.0 1238.0 1312.0 1285.0 1273.0 1268.0 992.0 1222.0 894.0 1176.0 1194.0 1197.0 1206.0 1166.0 1227.0 541.0 1199.0 1045.0 1054.0 1020.0 1005.0 1063.0 1214.0 1220.0

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 9e-06
  • train_batch_size: 16
  • eval_batch_size: 32
  • seed: 42
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 32
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_ratio: 0.06
  • num_epochs: 2

Framework versions

  • Transformers 4.48.0.dev0
  • Pytorch 2.5.1+cu124
  • Datasets 3.2.0
  • Tokenizers 0.21.0
Downloads last month
41
Safetensors
Model size
396M params
Tensor type
BF16
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for MoritzLaurer/ModernBERT-large-zeroshot-v2.0

Finetuned
(24)
this model

Collection including MoritzLaurer/ModernBERT-large-zeroshot-v2.0