ModernBERT-Large Rerank for Article Query Relevance Classification/Rerank

Model Description

This model is a fine-tuned variant of answerdotai/ModernBERT-large, leveraging ModernBERT’s extended context window (up to 8,192 tokens) and its speed/memory improvements. The primary task is Article Query Relevance Classification, where it determines how relevant an article is to a given query, using labels:

  • irrelevant
  • partial
  • sufficient

ModernBERT’s integration with FlashAttention makes training and inference more efficient, particularly on NVIDIA GPUs with compute capability 8.0+.

For NVIDIA GPUs with compute capability 8.0+ (Ampere/Ada/Hopper architecture - A100, A6000, RTX 3090, RTX 4090, H100 etc):

pip install flash-attn --no-build-isolation

For older NVIDIA GPUs (pre-Ampere):

pip install flash-attn --no-deps

Flash-Attention is strictly necessary as the model is trained using FlashAttention. In order to install Flash-Attention, you need a CUDA version greater than 11.7.

Intended Use

  • Primary Application: Classifying for reranking articles based on their relevance to user queries.
  • Out-of-Scope Use: This model was trained on a Britannica/rerank_classification related to article–query pairs. Its performance on unrelated tasks.

Dataset

Source & Structure

The model was fine-tuned on an Britannica/rerank_classification derived from the “Britannica Chatbot data dump of search topics.” It consists of three splits (train, validation, test) where each row includes:

  • premise: The query text.
  • hypothesis: The article text.
  • label: The classification of relevance (irrelevant, partial, or sufficient).

For training, the dataset files (rerank_classification_train_v3.csv, rerank_classification_test_v3.csv, etc.) were loaded via Hugging Face Datasets. Labels were integer-mapped (0: irrelevant, 1: partial, 2: sufficient).

Training Procedure

  • Base Model: answerdotai/ModernBERT-large
  • Max Sequence Length: 8192 tokens
  • Training Hyperparameters (selected highlights):
    • Batch Size: 4 for train, 2 for eval
    • Learning Rate: 1e-5
    • Number of Epochs: 4
    • Weight Decay: 8e-6
    • Adam Betas: (0.9, 0.98)
    • Loss: Cross-entropy
    • Precision: bf16
  • Optimizer: AdamW
  • Hardware: A single NVIDIA RTX 3090 was used, but any Ampere or newer GPU with sufficient memory is suitable.
  • Implementation: Fine-tuned using the Hugging Face Trainer API with a data collator (DataCollatorWithPadding) to handle variable input lengths.

Evaluation

Below is the classification report on the validation data for the Article Query Relevance task:

              precision    recall  f1-score   support

  irrelevant       0.97      0.86      0.91       230
     partial       0.82      0.93      0.87       178
  sufficient       0.96      0.96      0.96       182

    accuracy                           0.91       590
   macro avg       0.91      0.92      0.91       590
weighted avg       0.92      0.91      0.91       590

Confusion Matrix:

[[198  30   2]
 [  7 165   6]
 [  0   7 175]]

The model achieves around 91% accuracy overall.

How to Use

Usage

You can use these models directly with the transformers library. Until the next transformers release, doing so requires installing transformers from main:

pip install git+https://github.com/huggingface/transformers.git
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

tokenizer = AutoTokenizer.from_pretrained("Britannica/modernBERT-large-rerank")
model = AutoModelForSequenceClassification.from_pretrained("Britannica/modernBERT-large-rerank")
model.eval()

def classify_query_article(query, article):
    inputs = tokenizer(query, article, return_tensors="pt", truncation=True, padding="max_length", max_length=8192)
    with torch.no_grad():
        outputs = model(**inputs)
    predicted_label = torch.argmax(outputs.logits, dim=1).item()
    return predicted_label

Limitations & Considerations

  • Domain Specificity: The dataset is oriented toward article–query pairs from Britannica content.
  • Hardware Requirements: For best performance, a GPU with sufficient memory (e.g., A100, RTX 3090, RTX 4090) is recommended.
Downloads last month
9
Safetensors
Model size
396M params
Tensor type
F32
·
Inference Examples
Unable to determine this model's library. Check the docs .

Model tree for Britannica/modernBERT-large-rerank

Finetuned
(24)
this model