Instructions to use openbmb/MiniCPM-Llama3-V-2_5 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use openbmb/MiniCPM-Llama3-V-2_5 with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("image-text-to-text", model="openbmb/MiniCPM-Llama3-V-2_5", trust_remote_code=True) messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] pipe(text=messages)# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("openbmb/MiniCPM-Llama3-V-2_5", trust_remote_code=True, dtype="auto") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use openbmb/MiniCPM-Llama3-V-2_5 with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "openbmb/MiniCPM-Llama3-V-2_5" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "openbmb/MiniCPM-Llama3-V-2_5", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker
docker model run hf.co/openbmb/MiniCPM-Llama3-V-2_5
- SGLang
How to use openbmb/MiniCPM-Llama3-V-2_5 with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "openbmb/MiniCPM-Llama3-V-2_5" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "openbmb/MiniCPM-Llama3-V-2_5", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "openbmb/MiniCPM-Llama3-V-2_5" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "openbmb/MiniCPM-Llama3-V-2_5", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }' - Docker Model Runner
How to use openbmb/MiniCPM-Llama3-V-2_5 with Docker Model Runner:
docker model run hf.co/openbmb/MiniCPM-Llama3-V-2_5
Anybody know how/what can actually load/inference this model?
I have tried the following: Ooobabooga TextGenWebui, llama.cpp, ollama.cpp, kobold.cpp, Tabby, Exlv2, and LM Studio, and not a single one of them has support for this model. I am trying to use this model as an open source, locally run alternative to GPT-4, as I do not like or wish to support OpenAI in any way possible, but it seems as though this model is just designed in a way that means running it on any pre-existing GUI is impossible
Any additional info would be massively appreciated, as I am having to put my job on hold to try and sort out a GPT-4 alternative.
Important to note: I have absolutely 0 experience with Diffusers/Transformers, and I have very little experience with code as well. I am trying to find a solution that allows me to run this model in a way that I can direct a front end to its port and have it fulfill requests from a tagging/captioning GUI
Maybe you can use lmdeploy which support MiniCPM-Llama3-V-2_5, you can use command line to get gradio, api_server or chat at terminal
https://github.com/OpenBMB/llama.cpp/tree/minicpm-v2.5 - You can run llama.cpp server from here I believe
Maybe you can use
lmdeploywhich supportMiniCPM-Llama3-V-2_5, you can use command line to getgradio, api_server or chat at terminal
I tried this specifically today after a recommendation from a colleague. It seems much more straight forward than whatever mess I had going on, but unfortunately lmdeploy bloats the model to unusable size. Inferencing with the web server provided by openbmb does work just fine in 24GB VRAM even at FP16 (Does not offer the functionality I need unfortunately), but loading in lmdeploy causes it to balloon the model to a bloated and unusable 27GB VRAM in an effort to convert it to a turbomind file format. Ironically, this conversion is supposed to make the inference faster, but by making the model so overly obese, it takes my inference from 2-3 seconds per image to over 6 minutes.
Additionally, I tried to load the int4 version of the model in lmdeploy, only for it to not actually support it... So I really don't have any options that suffice with lmdeploy, as much as I really wish I could use it
I once encountered the same problem as you, but it was solved by downloading the gguf file. You can give it a try.