I was initially pretty sceptical about Meta's Coconut paper [1] because the largest perf gains were reported on toy linguistic problems. However, these results on machine translation are pretty impressive!
After some heated discussion 🔥, we clarify our intent re. storage limits on the Hub
TL;DR: - public storage is free, and (unless blatant abuse) unlimited. We do ask that you consider upgrading to PRO and/or Enterprise Hub if possible - private storage is paid above a significant free tier (1TB if you have a paid account, 100GB otherwise)
We applied the same data-driven approach that led to SOTA English performance in🍷 FineWeb to thousands of languages.
🥂 FineWeb2 has 8TB of compressed text data and outperforms other multilingual datasets in our experiments.
The dataset is released under the permissive 📜 ODC-By 1.0 license, and the 💻 code to reproduce it and our evaluations is public.
We will very soon announce a big community project, and are working on a 📝 blogpost walking you through the entire dataset creation process. Stay tuned!
The latest o1 model from OpenAI is still unable to answer 9.11 > 9.9 correctly 🤔
A possible explanation? Tokenization - and our latest work investigates how it affects a model's ability to do math!
In this blog post, we discuss: 🔢 The different ways numbers are tokenized in modern LLMs 🧪 Our detailed approach in comparing these various methods 🥪 How we got a free boost in arithmetic performance by adding a few lines of code to the base Llama 3 tokenizer 👑 and a definitive, best tokenization method for math in LLMs!
🤖 𝗔𝗱𝗼𝗯𝗲'𝘀 𝗰𝗼𝗱𝗲-𝗴𝗲𝗻𝗲𝗿𝗮𝘁𝗶𝗻𝗴 𝗮𝗴𝗲𝗻𝘁 𝗿𝗲𝗮𝗰𝗵𝗲𝘀 𝘁𝗵𝗲 𝘁𝗼𝗽 𝗼𝗳 𝗚𝗔𝗜𝗔 𝗹𝗲𝗮𝗱𝗲𝗿𝗯𝗼𝗮𝗿𝗱 - and their paper cites my work!
💡 Reminder: In short, Agentic systems are a vehicle in which you put your LLM to allow it access to the outside world.
➡️ The team of researchers at Adobe started from the idea that current agentic systems lack the ability to define their own tools. So they decided to make an agent that writes actions as code, thus allowing it to write python functions that can be re-used later as tools!
Here's what the LLM generations can look like with the proper prompt:
Thought: I need to access the excel file using a different method. Action:
defaccess_excel_file(file_path)
... # rest of the code (the agent does writes it, but I don't have room in this post)return rows
Then your system executes this and appends the observation to the agent's memory.
Why is this code formulation better than classical tool use formulation as JSON? The paper explains:
"Most existing work uses text or JSON as the representation of actions, which significantly lacks the two criteria mentioned earlier: generality and composability. In contrast, DynaSaur can utilize available actions or create new ones if necessary, using code as a unified representation. In principle, acting with code enables agents to solve any Turing-complete problem."
The idea of using code is not new: in fact, we do it in transformers.agents (thus the citation that I got). They implementation adds further refinements, like using RAG to retrieve relevant functions before generating an action, which increases performance further.
And they observe that code agents perform much better, reaching the top of GAIA leaderboard! 🥇
Go take a look, it's really clear and informative!
Hi HuggingFacers!🤗 I'm thrilled to introduce my latest project: 𝗦𝗲𝗻𝗧𝗿𝗘𝘃 (𝗦𝗲𝗻tence 𝗧𝗿ansformers 𝗘𝘃aluator), a python package that offers simple customizable evaluation for text retrieval accuracy and time performance of Sentence Transformers-compatible text embedders on PDF data!📊