Instructions to use openbmb/MiniCPM-Llama3-V-2_5 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use openbmb/MiniCPM-Llama3-V-2_5 with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("image-text-to-text", model="openbmb/MiniCPM-Llama3-V-2_5", trust_remote_code=True)
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
pipe(text=messages)

# Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("openbmb/MiniCPM-Llama3-V-2_5", trust_remote_code=True, dtype="auto")

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use openbmb/MiniCPM-Llama3-V-2_5 with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "openbmb/MiniCPM-Llama3-V-2_5"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "openbmb/MiniCPM-Llama3-V-2_5",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker

docker model run hf.co/openbmb/MiniCPM-Llama3-V-2_5

SGLang

How to use openbmb/MiniCPM-Llama3-V-2_5 with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "openbmb/MiniCPM-Llama3-V-2_5" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "openbmb/MiniCPM-Llama3-V-2_5",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "openbmb/MiniCPM-Llama3-V-2_5" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "openbmb/MiniCPM-Llama3-V-2_5",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Docker Model Runner
How to use openbmb/MiniCPM-Llama3-V-2_5 with Docker Model Runner:
```
docker model run hf.co/openbmb/MiniCPM-Llama3-V-2_5
```

Anybody know how/what can actually load/inference this model?

#50

by SytanSD - opened Jun 30, 2024

Discussion

SytanSD

Jun 30, 2024

I have tried the following: Ooobabooga TextGenWebui, llama.cpp, ollama.cpp, kobold.cpp, Tabby, Exlv2, and LM Studio, and not a single one of them has support for this model. I am trying to use this model as an open source, locally run alternative to GPT-4, as I do not like or wish to support OpenAI in any way possible, but it seems as though this model is just designed in a way that means running it on any pre-existing GUI is impossible

Any additional info would be massively appreciated, as I am having to put my job on hold to try and sort out a GPT-4 alternative.

Important to note: I have absolutely 0 experience with Diffusers/Transformers, and I have very little experience with code as well. I am trying to find a solution that allows me to run this model in a way that I can direct a front end to its port and have it fulfill requests from a tagging/captioning GUI

hzaustingg

Jul 3, 2024

Maybe you can use lmdeploy which support MiniCPM-Llama3-V-2_5, you can use command line to get gradio, api_server or chat at terminal

hitchhiker3010

Jul 5, 2024

https://github.com/OpenBMB/llama.cpp/tree/minicpm-v2.5 - You can run llama.cpp server from here I believe

SytanSD

Jul 6, 2024

Maybe you can use lmdeploy which support MiniCPM-Llama3-V-2_5, you can use command line to get gradio, api_server or chat at terminal

I tried this specifically today after a recommendation from a colleague. It seems much more straight forward than whatever mess I had going on, but unfortunately lmdeploy bloats the model to unusable size. Inferencing with the web server provided by openbmb does work just fine in 24GB VRAM even at FP16 (Does not offer the functionality I need unfortunately), but loading in lmdeploy causes it to balloon the model to a bloated and unusable 27GB VRAM in an effort to convert it to a turbomind file format. Ironically, this conversion is supposed to make the inference faster, but by making the model so overly obese, it takes my inference from 2-3 seconds per image to over 6 minutes.

Additionally, I tried to load the int4 version of the model in lmdeploy, only for it to not actually support it... So I really don't have any options that suffice with lmdeploy, as much as I really wish I could use it

Wallis2000

Jul 7, 2024

I once encountered the same problem as you, but it was solved by downloading the gguf file. You can give it a try.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment