weijiezz/chartqa_split_test
Viewer โข Updated โข 2k โข 16
How to use CloveAI/clov-vl-2b with Unsloth Studio:
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for CloveAI/clov-vl-2b to start chatting
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for CloveAI/clov-vl-2b to start chatting
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for CloveAI/clov-vl-2b to start chatting
pip install unsloth
from unsloth import FastModel
model, tokenizer = FastModel.from_pretrained(
model_name="CloveAI/clov-vl-2b",
max_seq_length=2048,
)A finetuned version of Qwen2-VL 2B Instruct trained to answer natural language questions about charts and graphs.
Finetuned on the ChartQA dataset using Unsloth on a Google Colab free T4 GPU.
| Base Model | Qwen2-VL-2B-Instruct |
| Finetuning Method | LoRA (r=8, alpha=8) |
| Training Data | 2,000 chart QA pairs |
| Training Steps | 500 |
| Batch Size | 8 (2 per device ร 4 gradient accumulation) |
| Trainable Parameters | 9,232,384 (0.42% of total) |
| Precision | fp16 |
| Hardware | Google Colab T4 (15GB VRAM) |
from transformers import AutoProcessor, Qwen2VLForConditionalGeneration
from PIL import Image
import torch
# Load model
model = Qwen2VLForConditionalGeneration.from_pretrained(
"alanjoshua2005/alan-vlm",
torch_dtype=torch.float16,
device_map="auto",
)
processor = AutoProcessor.from_pretrained("alanjoshua2005/alan-vlm")
# Run inference
def ask(image_path, question):
image = Image.open(image_path).convert("RGB")
messages = [{"role": "user", "content": [
{"type": "image"},
{"type": "text", "text": question},
]}]
text_prompt = processor.apply_chat_template(
messages,
add_generation_prompt=True,
tokenize=False,
)
inputs = processor(
text=text_prompt,
images=image,
return_tensors="pt"
)
inputs = {k: v.to("cuda") for k, v in inputs.items()}
with torch.no_grad():
output = model.generate(**inputs, max_new_tokens=64)
input_len = inputs["input_ids"].shape[1]
return processor.decode(output[0][input_len:], skip_special_tokens=True)
# Example
answer = ask("chart.png", "What is the value of the highest bar?")
print(answer)
import gradio as gr
from transformers import AutoProcessor, Qwen2VLForConditionalGeneration
from PIL import Image
import torch
model = Qwen2VLForConditionalGeneration.from_pretrained(
"alanjoshua2005/alan-vlm",
torch_dtype=torch.float16,
device_map="auto",
)
processor = AutoProcessor.from_pretrained("alanjoshua2005/alan-vlm")
def answer_chart_question(image, question):
if image is None or not question.strip():
return "Please provide both an image and a question."
image = image.convert("RGB")
messages = [{"role": "user", "content": [
{"type": "image"},
{"type": "text", "text": question},
]}]
text_prompt = processor.apply_chat_template(messages, add_generation_prompt=True, tokenize=False)
inputs = processor(text=text_prompt, images=image, return_tensors="pt")
inputs = {k: v.to("cuda") for k, v in inputs.items()}
with torch.no_grad():
output = model.generate(**inputs, max_new_tokens=64)
input_len = inputs["input_ids"].shape[1]
return processor.decode(output[0][input_len:], skip_special_tokens=True)
gr.Interface(
fn=answer_chart_question,
inputs=[gr.Image(type="pil"), gr.Textbox(label="Question")],
outputs=gr.Textbox(label="Answer"),
title="๐ ChartQA - alan-vlm"
).launch()
Trained on weijiezz/chartqa_split_test โ a 2,000 row dataset of chart images paired with questions and answers. Contains two types of questions:
human_test โ questions written by human annotatorsaugmented_test โ questions generated via data augmentationTraining was done using Unsloth for optimized LoRA finetuning:
from unsloth import FastVisionModel
model, tokenizer = FastVisionModel.from_pretrained(
"unsloth/Qwen2-VL-2B-Instruct",
load_in_4bit=True,
)
model = FastVisionModel.get_peft_model(
model,
finetune_vision_layers=True,
finetune_language_layers=True,
finetune_attention_modules=True,
finetune_mlp_modules=True,
r=8,
lora_alpha=8,
lora_dropout=0,
bias="none",
use_gradient_checkpointing="unsloth",
target_modules=["q_proj", "v_proj", "k_proj", "o_proj",
"gate_proj", "up_proj", "down_proj"],
)
Base model
Qwen/Qwen2-VL-2B