GWQ: Gradient-Aware Weight Quantization for Large Language Models Paper • 2411.00850 • Published Oct 30, 2024 • 1
IntactKV: Improving Large Language Model Quantization by Keeping Pivot Tokens Intact Paper • 2403.01241 • Published Mar 2, 2024 • 1