ModernBERT-Large Rerank for Article Query Relevance Classification/Rerank
Model Description
This model is a fine-tuned variant of answerdotai/ModernBERT-large, leveraging ModernBERT’s extended context window (up to 8,192 tokens) and its speed/memory improvements. The primary task is Article Query Relevance Classification, where it determines how relevant an article is to a given query, using labels:
- irrelevant
- partial
- sufficient
ModernBERT’s integration with FlashAttention makes training and inference more efficient, particularly on NVIDIA GPUs with compute capability 8.0+.
For NVIDIA GPUs with compute capability 8.0+ (Ampere/Ada/Hopper architecture - A100, A6000, RTX 3090, RTX 4090, H100 etc):
pip install flash-attn --no-build-isolation
For older NVIDIA GPUs (pre-Ampere):
pip install flash-attn --no-deps
Flash-Attention is strictly necessary as the model is trained using FlashAttention. In order to install Flash-Attention, you need a CUDA version greater than 11.7.
Intended Use
- Primary Application: Classifying for reranking articles based on their relevance to user queries.
- Out-of-Scope Use: This model was trained on a
Britannica/rerank_classification
related to article–query pairs. Its performance on unrelated tasks.
Dataset
Source & Structure
The model was fine-tuned on an Britannica/rerank_classification
derived from the “Britannica Chatbot data dump of search topics.” It consists of three splits (train, validation, test) where each row includes:
- premise: The query text.
- hypothesis: The article text.
- label: The classification of relevance (irrelevant, partial, or sufficient).
For training, the dataset files (rerank_classification_train_v3.csv
, rerank_classification_test_v3.csv
, etc.) were loaded via Hugging Face Datasets. Labels were integer-mapped (0: irrelevant, 1: partial, 2: sufficient).
Training Procedure
- Base Model:
answerdotai/ModernBERT-large
- Max Sequence Length: 8192 tokens
- Training Hyperparameters (selected highlights):
- Batch Size: 4 for train, 2 for eval
- Learning Rate: 1e-5
- Number of Epochs: 4
- Weight Decay: 8e-6
- Adam Betas: (0.9, 0.98)
- Loss: Cross-entropy
- Precision: bf16
- Optimizer: AdamW
- Hardware: A single NVIDIA RTX 3090 was used, but any Ampere or newer GPU with sufficient memory is suitable.
- Implementation: Fine-tuned using the Hugging Face Trainer API with a data collator (
DataCollatorWithPadding
) to handle variable input lengths.
Evaluation
Below is the classification report on the validation data for the Article Query Relevance task:
precision recall f1-score support
irrelevant 0.97 0.86 0.91 230
partial 0.82 0.93 0.87 178
sufficient 0.96 0.96 0.96 182
accuracy 0.91 590
macro avg 0.91 0.92 0.91 590
weighted avg 0.92 0.91 0.91 590
Confusion Matrix:
[[198 30 2]
[ 7 165 6]
[ 0 7 175]]
The model achieves around 91% accuracy overall.
How to Use
Usage
You can use these models directly with the transformers
library. Until the next transformers
release, doing so requires installing transformers from main:
pip install git+https://github.com/huggingface/transformers.git
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
tokenizer = AutoTokenizer.from_pretrained("Britannica/modernBERT-large-rerank")
model = AutoModelForSequenceClassification.from_pretrained("Britannica/modernBERT-large-rerank")
model.eval()
def classify_query_article(query, article):
inputs = tokenizer(query, article, return_tensors="pt", truncation=True, padding="max_length", max_length=8192)
with torch.no_grad():
outputs = model(**inputs)
predicted_label = torch.argmax(outputs.logits, dim=1).item()
return predicted_label
Limitations & Considerations
- Domain Specificity: The dataset is oriented toward article–query pairs from Britannica content.
- Hardware Requirements: For best performance, a GPU with sufficient memory (e.g., A100, RTX 3090, RTX 4090) is recommended.
- Downloads last month
- 9
Model tree for Britannica/modernBERT-large-rerank
Base model
answerdotai/ModernBERT-large