Instructions to use JarvisArt/JarvisArt-Preview with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use JarvisArt/JarvisArt-Preview with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("image-text-to-text", model="JarvisArt/JarvisArt-Preview") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] pipe(text=messages)# Load model directly from transformers import AutoProcessor, AutoModelForImageTextToText processor = AutoProcessor.from_pretrained("JarvisArt/JarvisArt-Preview") model = AutoModelForImageTextToText.from_pretrained("JarvisArt/JarvisArt-Preview") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] inputs = processor.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use JarvisArt/JarvisArt-Preview with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "JarvisArt/JarvisArt-Preview" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "JarvisArt/JarvisArt-Preview", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker
docker model run hf.co/JarvisArt/JarvisArt-Preview
- SGLang
How to use JarvisArt/JarvisArt-Preview with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "JarvisArt/JarvisArt-Preview" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "JarvisArt/JarvisArt-Preview", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "JarvisArt/JarvisArt-Preview" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "JarvisArt/JarvisArt-Preview", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }' - Docker Model Runner
How to use JarvisArt/JarvisArt-Preview with Docker Model Runner:
docker model run hf.co/JarvisArt/JarvisArt-Preview
Yunlong Lin1*, Zixu Lin1*, Kunjie Lin1*, Jinbin Bai5, Panwang Pan4, Chenxin Li3, Haoyu Chen2, Zhongdao Wang6, Xinghao Ding1†, Wenbo Li3♣, Shuicheng Yan5†
1Xiamen University, 2The Hong Kong University of Science and Technology (Guangzhou), 3 The Chinese University of Hong Kong, 4Bytedance, 5National University of Singapore, 6Tsinghua University
⚠️ Security Warning
IMPORTANT: This is the ONLY official JarvisArt repository!
We have identified fake repositories claiming to be JarvisArt that may contain malware, viruses, or malicious code. Please be extremely cautious and only use this official repository.
Known fake/malicious repositories:
- ❌
https://github.com/joelp0/JarvisArt- FAKE & POTENTIALLY DANGEROUS - ❌ Any other repositories not from our official organization
📝 Overview
JarvisArt workflow and results showcase
JarvisArt is a multi-modal large language model (MLLM)-driven agent for intelligent photo retouching. It is designed to liberate human creativity by understanding user intent, mimicking the reasoning of professional artists, and coordinating over 200 tools in Adobe Lightroom. JarvisArt utilizes a novel two-stage training framework, starting with Chain-of-Thought supervised fine-tuning for foundational reasoning, followed by Group Relative Policy Optimization for Retouching (GRPO-R) to enhance its decision-making and tool proficiency. Supported by the newly created MMArt dataset (55K samples) and MMArt-Bench, JarvisArt demonstrates superior performance, outperforming GPT-4o with a 60% improvement in pixel-level metrics for content fidelity while maintaining comparable instruction-following capabilities.
🎬 Demo Videos
Global Retouching Case
Local Retouching Case
JarvisArt supports multi-granularity retouching goals, ranging from scene-level adjustments to region-specific refinements. Users can perform intuitive, free-form edits through natural inputs such as text prompts and bounding boxes
📚 Citation
If you find JarvisArt useful in your research, please consider citing:
@article{jarvisart2025,
title={JarvisArt: Liberating Human Artistic Creativity via an Intelligent Photo Retouching Agent},
author={Yunlong Lin and Zixu Lin and Kunjie Lin and Jinbin Bai and Panwang Pan and Chenxin Li and Haoyu Chen and Zhongdao Wang and Xinghao Ding and Wenbo Li and Shuicheng Yan},
year={2025},
journal={arXiv preprint arXiv:2506.17612}
}
📧 Contact
For any questions or inquiries, please reach out to us:
- Yunlong Lin: linyl@stu.xmu.edu.cn
- Zixu Lin: a860620266@gmail.com
- Kunjie Lin: linkunjie@stu.xmu.edu.cn
🙏 Acknowledgements
We would like to express our gratitude to LLaMA-Factory and gradio_image_annotator for their valuable open-source contributions which have provided important technical references for our work.
- Downloads last month
- 40