view article Article πΊπ¦ββ¬ LLM Comparison/Test: DeepSeek-V3, QVQ-72B-Preview, Falcon3 10B, Llama 3.3 70B, Nemotron 70B in my updated MMLU-Pro CS benchmark By wolfram β’ 2 days ago β’ 27
view post Post 3894 supercharge your LLM apps with smolagents π₯however cool your LLM is, without being agentic it can only go so farenter smolagents: a new agent library by Hugging Face to make the LLM write code, do analysis and automate boring stuff!Here's our blog for you to get started https://huggingface.co/blog/smolagents See translation π₯ 13 13 π 5 5 β€οΈ 5 5 + Reply
view post Post 3077 The deepseek-ai/DeepSeek-V3 is very good! I have been playing with it and found it is really good at one-shotting a pretty good landing page.You can play with it here: https://deepseek-artifacts.vercel.appAll the responses get saved in the cfahlgren1/react-code-instructions dataset. Hopefully we can build one of the biggest, highest quality frontend datasets on the hub πͺ See translation π 11 11 π 7 7 + Reply
view post Post 2048 Check out the early preview of the upcoming Tachibana-QVQ dataset: code-reasoning and code-instruct data generated with Qwen/QVQ-72B-PreviewLink here: sequelbox/Tachibana-QVQ-PREVIEWmore to come :) See translation 1 reply Β· π 5 5 π 3 3 + Reply
How Well Do LLMs Generate Code for Different Application Domains? Benchmark and Evaluation Paper β’ 2412.18573 β’ Published 11 days ago β’ 1