[Feature] Remove "model voting"

#1059
by T145 - opened

The times I've most enjoyed using the leaderboard are when the voting mechanism is disabled, and everyone can benchmark their models efficiently. In other words, keeping the process "first come, first served" has seemed to be the best option.

I agree, this way large corporations with a huge audience have a much better chance of evolving their model first, thereby discriminating against ordinary users who just want to test and show the community their finetune or merge

Open LLM Leaderboard org

Hi!
We thought a lot before adding the model voting system, and we will not remove it: before it, a number of users have been abusing the system, submitting dozens of models, creating new organizations or users to submit all their models at once, etc. Flagging this kind of behavior manually is very hard and time consuming.

On the other hand, encouraging users to vote for the models they want to see evaluated allow us to prioritize models which are most important to the community, instead of evaluating the 30 models of "bob_does_hyperparameter_search" because he submitted them first. As these evaluations are run for free on our spare cycles, it's important to have some sort of priority system in the rare cases when we're running low on compute, which only happens a couple times per month anyway.

Side note: models from "large corporations" are already evaluated with priority. If Meta, Qwen, Mistral, DeepSeek, Eleuther, etc are releasing a model, they automatically go on a priority queue: a Qwen4 (for example) will be a building block for the rest of the community, that everybody will use to build new models, so it's important to know the performances fast.

clefourrier changed discussion status to closed

Does the model voting system not also encourage the same kind of abuse? Before the most recent update, there was a timeout system in place that prevented single users and organizations from spamming submissions. Now it appears as though that guard is no longer in place (or at least made more lenient). I, as a small creator, am now encouraged to create many alt accounts to upvote my own models while also creating many submissions in an attempt to isolate the top slots in the queue. This system further divides creators by popularity.

I know that there's a lot of nuance to running a public service, as nothing is ever truly "free," and respect that this service is available at all. However, I'd assume the goal in making it public is to allow smaller creators or organizations to test their work, as larger corporations have the means to run the evaluation harness themselves. Therefore I'd like to propose the following changes:

  1. Create strict rules on how to use the platform that a user must agree to before submitting a model.
    a. A timeout of 24 hours could be sufficient. I'd be happy just having one model tested per day.
    b. A rate limit could be another option, like no more than 3 submissions per 24 hours.
  2. Automatically ban creators that violate the rules.
    a. Yes, they can create another account, but they're already making dozens just to upvote their models anyway. And who knows, maybe they'd learn their lesson(s)?
  3. Reward creators for their popularity and rule-following with public queue priority.
    a. Maybe likes could play a role somehow?

Does that sound fair? Were these options already considered?

Sign up or log in to comment