VCoder LLaVA-1.5-7b

VCoder LLaVA-1.5-7b was trained on COST training dataset in December 2023. It uses the pretrained LLaVA-1.5-7b model weights. It was introduced by Jain et al. in this repository.

VCoder is an adapter for improving existing Multimodal LLMs at object-level perception tasks with the use of perception modalities as control inputs while retaining performance on other tasks.

img

Citation

@article{jain2023vcoder,
    title={{VCoder: Versatile Vision Encoders for Multimodal Large Language Models}},
    author={Jitesh Jain and Jianwei Yang and Humphrey Shi},
    journal={arXiv},
    year={2023}
}
Downloads last month
14
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Collection including shi-labs/vcoder_llava-v1.5-7b