Instructions to use MiaoshouAI/Florence-2-base-PromptGen with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use MiaoshouAI/Florence-2-base-PromptGen with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("image-text-to-text", model="MiaoshouAI/Florence-2-base-PromptGen", trust_remote_code=True)

# Load model directly
from transformers import AutoProcessor, AutoModelForImageTextToText

processor = AutoProcessor.from_pretrained("MiaoshouAI/Florence-2-base-PromptGen", trust_remote_code=True)
model = AutoModelForImageTextToText.from_pretrained("MiaoshouAI/Florence-2-base-PromptGen", trust_remote_code=True)

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use MiaoshouAI/Florence-2-base-PromptGen with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "MiaoshouAI/Florence-2-base-PromptGen"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "MiaoshouAI/Florence-2-base-PromptGen",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker

docker model run hf.co/MiaoshouAI/Florence-2-base-PromptGen

SGLang

How to use MiaoshouAI/Florence-2-base-PromptGen with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "MiaoshouAI/Florence-2-base-PromptGen" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "MiaoshouAI/Florence-2-base-PromptGen",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "MiaoshouAI/Florence-2-base-PromptGen" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "MiaoshouAI/Florence-2-base-PromptGen",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Docker Model Runner
How to use MiaoshouAI/Florence-2-base-PromptGen with Docker Model Runner:
```
docker model run hf.co/MiaoshouAI/Florence-2-base-PromptGen
```

Florence-2-base-PromptGen

Florence-2-base-PromptGen is a model trained for MiaoshouAI Tagger for ComfyUI. It is an advanced image captioning tool based on the Microsoft Florence-2 Model and fine-tuned to perfection.

Why another tagging model?

Most vision models today are trained mainly for general vision recognition purposes, but when doing prompting and image tagging for model training, the format and details of the captions is quite different.

Florence-2-base-PromptGen is trained on such a purpose as aiming to improve the tagging experience and accuracy of the prompt and tagging job. The model is trained based on images and cleaned tags from Civitai so that the end result for tagging the images are the prompts you use to generate these images.

Instruction prompt:

A new instruction prompt <GENERATE_PROMPT> is created for this purpose in addition to <DETAILED_CAPTION> and <MORE_DETAILED_CAPTION>. It will respond back in danbooru tagging style with much better accuracy and proper level of details.

Version Histroy:

v0.8 New Instruction trained for <GENERATE_PROMPT>

v0.9 Improved vision ability for uncensored data for <DETAILED_CAPTION> and <MORE_DETAILED_CAPTION>

How to use:

To use this model, you can load it directly from the Hugging Face Model Hub:


model = AutoModelForCausalLM.from_pretrained("MiaoshouAI/Florence-2-base-PromptGen", trust_remote_code=True)
processor = AutoProcessor.from_pretrained("MiaoshouAI/Florence-2-base-PromptGen", trust_remote_code=True)

prompt = "<GENERATE_PROMPT>"

url = "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/tasks/car.jpg?download=true"
image = Image.open(requests.get(url, stream=True).raw)

inputs = processor(text=prompt, images=image, return_tensors="pt").to(device)

generated_ids = model.generate(
    input_ids=inputs["input_ids"],
    pixel_values=inputs["pixel_values"],
    max_new_tokens=1024,
    do_sample=False,
    num_beams=3
)
generated_text = processor.batch_decode(generated_ids, skip_special_tokens=False)[0]

parsed_answer = processor.post_process_generation(generated_text, task=prompt, image_size=(image.width, image.height))

print(parsed_answer)

Use under MiaoshouAI Tagger ComfyUI

If you just want to use this model, you can use it under ComfyUI-Miaoshouai-Tagger

https://github.com/miaoshouai/ComfyUI-Miaoshouai-Tagger

A detailed use and install instruction is already there.

Downloads last month: 334

Safetensors

Model size

0.3B params

Tensor type

F32

Inference Providers NEW

Image-Text-to-Text

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support