LLaVA-Phi Model

This is a vision-language model based on Microsoft's Phi-1.5 architecture with CLIP for image processing capabilities.

Model Description

  • Base Model: Microsoft Phi-1.5
  • Vision Encoder: CLIP ViT-B/32
  • Training: QLoRA fine-tuning
  • Dataset: Instruct 150K

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer, AutoProcessor
import torch
from PIL import Image

# Load model and tokenizer
model = AutoModelForCausalLM.from_pretrained("sagar007/Lava_phi")
tokenizer = AutoTokenizer.from_pretrained("sagar007/Lava_phi")
processor = AutoProcessor.from_pretrained("openai/clip-vit-base-patch32")

# For text
def generate_text(prompt):
    inputs = tokenizer(f"human: {prompt}\ngpt:", return_tensors="pt")
    outputs = model.generate(**inputs, max_new_tokens=128)
    return tokenizer.decode(outputs[0], skip_special_tokens=True)

# For images
def process_image_and_prompt(image_path, prompt):
    image = Image.open(image_path)
    image_tensor = processor(images=image, return_tensors="pt").pixel_values
    
    inputs = tokenizer(f"human: <image>\n{prompt}\ngpt:", return_tensors="pt")
    outputs = model.generate(
        input_ids=inputs["input_ids"],
        attention_mask=inputs["attention_mask"],
        images=image_tensor,
        max_new_tokens=128
    )
    return tokenizer.decode(outputs[0], skip_special_tokens=True)

Training Details

  • Trained using QLoRA (Quantized Low-Rank Adaptation)
  • 4-bit quantization for efficiency
  • Gradient checkpointing enabled
  • Mixed precision training (bfloat16)

License

MIT License

Citation

@software{llava_phi_2024,
  author = {sagar007},
  title = {LLaVA-Phi: Vision-Language Model},
  year = {2024},
  publisher = {Hugging Face},
  url = {https://huggingface.co/sagar007/Lava_phi}
}
Downloads last month
44
Safetensors
Model size
833M params
Tensor type
F32
BF16
U8
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for sagar007/Lava_phi

Base model

microsoft/phi-1_5
Quantized
(11)
this model

Space using sagar007/Lava_phi 1