Finally, a Replacement for BERT: Introducing ModernBERT
•
442
On entailment adjacent tasks (which btw, great work on the zero-shot NLI models @MoritzLaurer !), I'd expect DeBERTa to be slightly better than ModernBERT -- it seems its pretraining objective is better aligned with it. In our evals, we consistently had DeBERTa come on top on MNLI (there's a full GLUE table in the appendix of the paper), it's only on aggregated GLUE that we saw ModernBERT-Base beat DeBERTaV3-Base.