A dynamic parallel method for performance optimization on hybrid CPUs Paper • 2411.19542 • Published Nov 29, 2024 • 5
The Model Openness Framework: Promoting Completeness and Openness for Reproducibility, Transparency, and Usability in Artificial Intelligence Paper • 2403.13784 • Published Mar 20, 2024
CLAIMED -- the open source framework for building coarse-grained operators for accelerated discovery in science Paper • 2307.06824 • Published Jul 12, 2023
Introducing v0.5 of the AI Safety Benchmark from MLCommons Paper • 2404.12241 • Published Apr 18, 2024 • 10
TEQ: Trainable Equivalent Transformation for Quantization of LLMs Paper • 2310.10944 • Published Oct 17, 2023 • 9
Efficient Post-training Quantization with FP8 Formats Paper • 2309.14592 • Published Sep 26, 2023 • 10
Optimize Weight Rounding via Signed Gradient Descent for the Quantization of LLMs Paper • 2309.05516 • Published Sep 11, 2023 • 9
An Efficient Sparse Inference Software Accelerator for Transformer-based Language Models on CPUs Paper • 2306.16601 • Published Jun 28, 2023 • 4