Hydragen: High-Throughput LLM Inference with Shared Prefixes Paper • 2402.05099 • Published Feb 7, 2024 • 19
1-bit AI Infra: Part 1.1, Fast and Lossless BitNet b1.58 Inference on CPUs Paper • 2410.16144 • Published Oct 21, 2024 • 3