Instructions to use dphn/dolphincoder-starcoder2-15b with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use dphn/dolphincoder-starcoder2-15b with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="dphn/dolphincoder-starcoder2-15b")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("dphn/dolphincoder-starcoder2-15b")
model = AutoModelForCausalLM.from_pretrained("dphn/dolphincoder-starcoder2-15b")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use dphn/dolphincoder-starcoder2-15b with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "dphn/dolphincoder-starcoder2-15b"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "dphn/dolphincoder-starcoder2-15b",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/dphn/dolphincoder-starcoder2-15b

SGLang

How to use dphn/dolphincoder-starcoder2-15b with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "dphn/dolphincoder-starcoder2-15b" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "dphn/dolphincoder-starcoder2-15b",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "dphn/dolphincoder-starcoder2-15b" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "dphn/dolphincoder-starcoder2-15b",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use dphn/dolphincoder-starcoder2-15b with Docker Model Runner:
```
docker model run hf.co/dphn/dolphincoder-starcoder2-15b
```

dolphincoder tune of starcoder2-15b-instruct?

by jac-cbi - opened May 4, 2024

Discussion

jac-cbi

May 4, 2024

@ehartford , I've been using dolphincoder, and have been really happy with it (q8_0 gguf via Ollama). Thanks!

I was excited to hear about starcoder2 instruct models, and decided to give them a spin. It wasn't good. It would do what I asked (In rust, my standard eval questions), but as soon as I asked it to refine it (multi-turn), it would default back to python, and make up some function I wasn't even asking about.

Is it worth the effort to fine-tune SC2-I with dolphincoder?
If so, is that something you could add to the pipeline?

I'm happy to give it a spin myself, but it'll take a while, I only have an Nvidia T1000 (4GB VRAM, 7.5 compute) and dual sandylakes with 256GB DDR3. That, and I've never done a fine-tune before...

dagbs

May 4, 2024

I wonder if it thought python would be more efficient which would have technically given the correct answer. I would try to re-ask but setup the system prompt to ensure all responses are in the language you want.

ehartford

Dolphin org May 5, 2024

I don't tune instruct models. I only tune base models.

ehartford

Dolphin org May 5, 2024

But, I could add the dataset for starcoder2-instruct into the mix of dolphin. In fact, I think I will.

jac-cbi

May 5, 2024

I wonder if it thought python would be more efficient which would have technically given the correct answer. I would try to re-ask but setup the system prompt to ensure all responses are in the language you want.

Ha! I get what you're saying, but I'm evaluating models for a specific use case, which will be different languages and multiple users. If it can't follow a thread of conversation like the dolphin tunes can, then that means it's not suitable for my use case. Sadly.

jac-cbi

May 5, 2024

But, I could add the dataset for starcoder2-instruct into the mix of dolphin. In fact, I think I will.

Sweet! That's a damn good idea! Can't wait for the release

jac-cbi changed discussion status to closed May 5, 2024

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment