[NeurIPS'24]Q-VLM: Post-training Quantization for Large Vision-Language Models

Efficient and accurate memory saving method towards W4A4 large multi-modal models. [Paper][Code]

Q-VLM: Post-training Quantization for Large Vision-Language Models
Changyuan Wang, Ziwei Wang, Xiuwei Xu, Yansong Tang, Jie Zhou, Jiwen Lu

Finetuning LLaVA Model on ScienceQA Dataset

Thanks for LLaVA (https://github.com/haotian-liu/LLaVA) for the amazing open-source model!

We combined the LLaVA-7B-v1.1 model (LLaVA-7B-v1.1) and the projector from LLaVA-7B-v1.3 (LLaVA-7B-v1.3 projector) and finetuned the model on the ScienceQA dataset. This model is used to test the effectiveness of our quantization method on the ScienceQA dataset.