Chinese Visual-language Multi-modal models for captions and robot actions

Release

  • [9/22] 🔥 We release two major models. The CN-caption model is for accurate chinese image captioning while robot action model is for demo-level robot action.

Contents

CNCaption models

This model can provide accurate and fine-grained Chinese descriptions of given images.

robot models

This model can provide accurate instructions for robot actions

Install

  1. Install Package
conda create -n llava python=3.10 -y
conda activate llava
pip install --upgrade pip  
pip install -e .
  1. Install additional packages for training cases
pip install ninja
pip install flash-attn --no-build-isolation

Demo

To run our demo, you need to prepare LLaVA checkpoints locally. Please follow the instructions here to download the checkpoints.

Gradio Web UI

To launch a Gradio demo locally, please run the following commands one by one. If you plan to launch multiple model workers to compare between different checkpoints, you only need to launch the controller and the web server ONCE.

Launch a controller

python -m llava.serve.controller --host 0.0.0.0 --port 10000

Launch a gradio web server.

python -m llava.serve.gradio_web_server --controller http://localhost:10000 --model-list-mode reload

You just launched the Gradio web interface. Now, you can open the web interface with the URL printed on the screen worker.

Launch a model worker

This is the actual worker that performs the inference on the GPU. Each worker is responsible for a single model specified in --model-path.

python -m llava.serve.model_worker --host 0.0.0.0 --controller http://localhost:10000 --port 40000 --worker http://localhost:40000 --model-path {--model-path}

API inference

We also provide an API interface for more convenient use. We provide server-side startup scripts and client-side test code here.

Server

python -m llava.serve.controller --host 0.0.0.0 --port 10000
python -m llava.serve.model_worker --host 0.0.0.0 --controller http://localhost:10000 --port 40000 --worker http://localhost:40000 --model-path {--model-path}

Client

python req_test.py ${text} ${image}
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference API
Unable to determine this model's library. Check the docs .