view article Article πΊπ¦ββ¬ LLM Comparison/Test: DeepSeek-V3, QVQ-72B-Preview, Falcon3 10B, Llama 3.3 70B, Nemotron 70B in my updated MMLU-Pro CS benchmark By wolfram β’ 2 days ago β’ 27
Next Token Prediction Towards Multimodal Intelligence: A Comprehensive Survey Paper β’ 2412.18619 β’ Published 20 days ago β’ 46
Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder for Fast, Memory Efficient, and Long Context Finetuning and Inference Paper β’ 2412.13663 β’ Published 18 days ago β’ 116
No More Adam: Learning Rate Scaling at Initialization is All You Need Paper β’ 2412.11768 β’ Published 19 days ago β’ 41
view article Article πΊπ¦ββ¬ LLM Comparison/Test: 25 SOTA LLMs (including QwQ) through 59 MMLU-Pro CS benchmark runs By wolfram β’ Dec 4, 2024 β’ 75
view article Article Releasing the largest multilingual open pretraining dataset By Pclanglais β’ Nov 13, 2024 β’ 98
view article Article βοΈ π§πΌβπΎ Let's grow some Domain Specific Datasets together By burtenshaw β’ Apr 29, 2024 β’ 29
view article Article RAG Empowerment: Cohere C4AI Command-R and Transformers Unveiled By Andyrasika β’ Apr 7, 2024 β’ 10