Training Data?

#32
by binarymax - opened

Hi! Excellent work on this model. Can you please share more information on the training data used? The sources are quite vague, and it would be good to know more specifics to understand what content/domains this might better align with than others.

Hello,

Unfortunately, this is the most we can share about the data, I am deeply sorry about this.
Hopefully the broad domains and experiments can give signals about the domains ModernBERT is aligned with ; the contents in themselves should be quite diverse.

Sign up or log in to comment