-
Medusa: Simple LLM Inference Acceleration Framework with Multiple Decoding Heads
Paper • 2401.10774 • Published • 54 -
APAR: LLMs Can Do Auto-Parallel Auto-Regressive Decoding
Paper • 2401.06761 • Published • 1 -
Infinite-LLM: Efficient LLM Service for Long Context with DistAttention and Distributed KVCache
Paper • 2401.02669 • Published • 14 -
MambaByte: Token-free Selective State Space Model
Paper • 2401.13660 • Published • 53
Collections
Discover the best community collections!
Collections including paper arxiv:2402.13144
-
Self-Rewarding Language Models
Paper • 2401.10020 • Published • 145 -
Orion-14B: Open-source Multilingual Large Language Models
Paper • 2401.12246 • Published • 12 -
MambaByte: Token-free Selective State Space Model
Paper • 2401.13660 • Published • 53 -
MM-LLMs: Recent Advances in MultiModal Large Language Models
Paper • 2401.13601 • Published • 45
-
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits
Paper • 2402.17764 • Published • 605 -
CLEAR: Character Unlearning in Textual and Visual Modalities
Paper • 2410.18057 • Published • 200 -
Unpacking SDXL Turbo: Interpreting Text-to-Image Models with Sparse Autoencoders
Paper • 2410.22366 • Published • 77 -
Emu3: Next-Token Prediction is All You Need
Paper • 2409.18869 • Published • 94