view article Article Introducing multi-backends (TRT-LLM, vLLM) support for Text Generation Inference 3 days ago • 42
DataGemma Release Collection A series of pioneering open models that help ground LLMs in real-world data through Data Commons. • 2 items • Updated Dec 13, 2024 • 82
LLM in a flash: Efficient Large Language Model Inference with Limited Memory Paper • 2312.11514 • Published Dec 12, 2023 • 257