Evaluation on the Hub

Activity Feed

AI & ML interests

None defined yet.

Recent Activity

autoevaluate's activity

lewtun 
posted an update about 21 hours ago
view post
Post
1426
I was initially pretty sceptical about Meta's Coconut paper [1] because the largest perf gains were reported on toy linguistic problems. However, these results on machine translation are pretty impressive!

https://x.com/casper_hansen_/status/1875872309996855343

Together with the recent PRIME method [2] for scaling RL, reasoning for open models is looking pretty exciting for 2025!

[1] Training Large Language Models to Reason in a Continuous Latent Space (2412.06769)
[2] https://huggingface.co/blog/ganqu/prime
lewtun 
posted an update 8 days ago
view post
Post
1965
This paper ( HuatuoGPT-o1, Towards Medical Complex Reasoning with LLMs (2412.18925)) has a really interesting recipe for inducing o1-like behaviour in Llama models:

* Iteratively sample CoTs from the model, using a mix of different search strategies. This gives you something like Stream of Search via prompting.
* Verify correctness of each CoT using GPT-4o (needed because exact match doesn't work well in medicine where there are lots of aliases)
* Use GPT-4o to reformat the concatenated CoTs into a single stream that includes smooth transitions like "hmm, wait" etc that one sees in o1
* Use the resulting data for SFT & RL
* Use sparse rewards from GPT-4o to guide RL training. They find RL gives an average ~3 point boost across medical benchmarks and SFT on this data already gives a strong improvement.

Applying this strategy to other domains could be quite promising, provided the training data can be formulated with verifiable problems!
  • 1 reply
·
sayakpaul 
posted an update 14 days ago
sayakpaul 
posted an update 20 days ago
view post
Post
1758
In the past seven days, the Diffusers team has shipped:

1. Two new video models
2. One new image model
3. Two new quantization backends
4. Three new fine-tuning scripts
5. Multiple fixes and library QoL improvements

Coffee on me if someone can guess 1 - 4 correctly.
  • 1 reply
·
lewtun 
posted an update 21 days ago
view post
Post
6665
We outperform Llama 70B with Llama 3B on hard math by scaling test-time compute 🔥

How? By combining step-wise reward models with tree search algorithms :)

We show that smol models can match or exceed the performance of their much larger siblings when given enough "time to think"

We're open sourcing the full recipe and sharing a detailed blog post.

In our blog post we cover:

📈 Compute-optimal scaling: How we implemented DeepMind's recipe to boost the mathematical capabilities of open models at test-time.

🎄 Diverse Verifier Tree Search (DVTS): An unpublished extension we developed to the verifier-guided tree search technique. This simple yet effective method improves diversity and delivers better performance, particularly at large test-time compute budgets.

🧭 Search and Learn: A lightweight toolkit for implementing search strategies with LLMs and built for speed with vLLM

Here's the links:

- Blog post: HuggingFaceH4/blogpost-scaling-test-time-compute

- Code: https://github.com/huggingface/search-and-learn

Enjoy!
  • 2 replies
·
dylanebert 
posted an update 25 days ago
view post
Post
2006
TRELLIS is now the highest ranked open-source model in the 3D Arena Leaderboard, surpassing InstantMesh

dylanebert/3d-arena
lhoestq 
posted an update 25 days ago
view post
Post
1665
Made a HF Dataset editor a la gg sheets here: lhoestq/dataset-spreadsheets

With Dataset Spreadsheets:
✏️ Edit datasets in the UI
🔗 Share link with collaborators
🐍 Use locally in DuckDB or Python

Available for the 100,000+ parquet datasets on HF :)
sayakpaul 
posted an update 28 days ago
view post
Post
2072
Introducing a high-quality open-preference dataset to further this line of research for image generation.

Despite being such an inseparable component for modern image generation, open preference datasets are a rarity!

So, we decided to work on one with the community!

Check it out here:
https://huggingface.co/blog/image-preferences
  • 7 replies
·
thomwolf 
posted an update 29 days ago
view post
Post
4572
We are proud to announce HuggingFaceFW/fineweb-2: A sparkling update to HuggingFaceFW/fineweb with 1000s of 🗣️languages.

We applied the same data-driven approach that led to SOTA English performance in🍷 FineWeb to thousands of languages.

🥂 FineWeb2 has 8TB of compressed text data and outperforms other multilingual datasets in our experiments.

The dataset is released under the permissive 📜 ODC-By 1.0 license, and the 💻 code to reproduce it and our evaluations is public.

We will very soon announce a big community project, and are working on a 📝 blogpost walking you through the entire dataset creation process. Stay tuned!

In the mean time come ask us question on our chat place: HuggingFaceFW/discussion

H/t @guipenedo @hynky @lvwerra as well as @vsabolcec Bettina Messmer @negar-foroutan and @mjaggi
  • 2 replies
·
sayakpaul 
posted an update 29 days ago
view post
Post
2119
The Control family of Flux from @black-forest-labs should be discussed more!

It enables structural controls like ControlNets while being significantly less expensive to run!

So, we're working on a Control LoRA training script 🤗

It's still WIP, so go easy:
https://github.com/huggingface/diffusers/pull/10130
dylanebert 
posted an update about 1 month ago
view post
Post
2813
blender has AI now
dylanebert 
posted an update about 1 month ago
view post
Post
3129
🟦 New open-source Image-to-3D model from Microsoft

TRELLIS: Structured 3D Latents for Scalable and Versatile 3D Generation

it's really good! the topology isn't clean, but it's a very very good 3D reference

JeffreyXiang/TRELLIS-image-large
thomwolf 
posted an update about 1 month ago
thomwolf 
posted an update about 1 month ago
abhishek 
posted an update about 1 month ago
view post
Post
1712
🎉 SUPER BLACK FRIDAY DEAL 🎉

Train almost any model on a variety of tasks such as llm finetuning, text classification/regression, summarization, question answering, image classification/regression, object detection, tabular data, etc for FREE using AutoTrain locally. 🔥
https://github.com/huggingface/autotrain-advanced
sayakpaul 
posted an update about 1 month ago
dylanebert 
posted an update about 1 month ago
view post
Post
1607
Generate meshes with AI locally in Blender

📢 New open-source release

meshgen, a local blender integration of LLaMa-Mesh, is open source and available now 🤗

get started here: https://github.com/huggingface/meshgen
thomwolf 
posted an update about 1 month ago