Instructions to use hhenryz/LOVA3-llava-v1.5-7b with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use hhenryz/LOVA3-llava-v1.5-7b with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("image-text-to-text", model="hhenryz/LOVA3-llava-v1.5-7b")# Load model directly from transformers import AutoProcessor, AutoModelForCausalLM processor = AutoProcessor.from_pretrained("hhenryz/LOVA3-llava-v1.5-7b") model = AutoModelForCausalLM.from_pretrained("hhenryz/LOVA3-llava-v1.5-7b") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use hhenryz/LOVA3-llava-v1.5-7b with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "hhenryz/LOVA3-llava-v1.5-7b" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "hhenryz/LOVA3-llava-v1.5-7b", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/hhenryz/LOVA3-llava-v1.5-7b
- SGLang
How to use hhenryz/LOVA3-llava-v1.5-7b with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "hhenryz/LOVA3-llava-v1.5-7b" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "hhenryz/LOVA3-llava-v1.5-7b", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "hhenryz/LOVA3-llava-v1.5-7b" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "hhenryz/LOVA3-llava-v1.5-7b", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use hhenryz/LOVA3-llava-v1.5-7b with Docker Model Runner:
docker model run hf.co/hhenryz/LOVA3-llava-v1.5-7b
π‘Key Contributions:
LOVA3 - To the best of our knowledge, LOVA3 is the first effort to imbue the asking and assessment abilities in training a robust and intelligent MLLM, inspired from human learning mechanism.
EvalQABench - We build a new benchmark EvalQABench for the VQA correction evaluation as the first effort to advance the development of future research.
Performance Improvement - Training with our proposed LOVA3 framework, we observe consistent improvement on 10 representative benchmarks.
Model weight
Pretrained weight: LOVA3-llava-v1.5-7b
Download it by using following command:
git clone https://huggingface.co/hhenryz/LOVA3-llava-v1.5-7b
Training Data
Here we provide the training/Evaluation/Testing sets of EvalQABench under the folder
EvalQABench.Training data: Mixed_VQA_GenQA_EvalQA_1.5M.jsonl.
Image Datasets
Please download the images from constituting datasets:
- COCO: train2014
- GQA: images
- OCR-VQA: download script, we save all files as
.jpg - AOKVQA: download script
- TextVQA: train_val_images
- VisualGenome: part1, part2
- LLaVA-Instruct: huggingface
π Evaluation
Download LOVA3-llava-v1.5-7b under the folder
checkpoints.Download the CLIP vision encoder clip-vit-large-patch14-336 under the folder
checkpointsRun the evaluation scripts under the folder
scripts/v1_5/eval. There are 12 multimodal datasets and benchmarks awaiting evaluation.
Take VizWiz as an example, the running command is as follows:
modelname=LOVA3-llava-v1.5-7b
python -m llava.eval.model_vqa_loader \
--model-path checkpoints/$modelname \
--question-file ./playground/data/eval/vizwiz/llava_test.jsonl \
--image-folder /yourpath/vizwiz/test/ \
--answers-file ./playground/data/eval/vizwiz/answers/$modelname.jsonl \
--temperature 0 \
--conv-mode vicuna_v1
python scripts/convert_vizwiz_for_submission.py \
--annotation-file ./playground/data/eval/vizwiz/llava_test.jsonl \
--result-file ./playground/data/eval/vizwiz/answers/$modelname.jsonl \
--result-upload-file ./playground/data/eval/vizwiz/answers_upload/$modelname.json
Training
Download the pretrained MLP adapter weights llava-v1.5-mlp2x-336px-pretrain-vicuna-7b-v1.5 from and put it under the folder
checkpoints.Download the model weight clip-vit-large-patch14-336 under the folder
checkpoints.Download the model weight vicuna-7b-v1.5 under the folder
checkpoints.Download the training data Mixed_VQA_GenQA_EvalQA_1.5M.jsonl under the folder
data.Run the training script.
bash scripts/v1_5/finetune.sh
π Acknowledgement
π Citation
If you find LOVA3 useful, please cite using this BibTeX:
@inproceedings{
zhao2024lova,
title={{LOVA}3: Learning to Visual Question Answering, Asking and Assessment},
author={Hengyuan Zhao and Pan Zhou and Difei Gao and Zechen Bai and Mike Zheng Shou},
booktitle={The Thirty-eighth Annual Conference on Neural Information Processing Systems},
year={2024},
url={https://openreview.net/forum?id=vIOKLMl6wu}
}
- Downloads last month
- 8