Instructions to use v000000/NM-12B-Lyris-dev-2 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use v000000/NM-12B-Lyris-dev-2 with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="v000000/NM-12B-Lyris-dev-2")

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("v000000/NM-12B-Lyris-dev-2")
model = AutoModelForCausalLM.from_pretrained("v000000/NM-12B-Lyris-dev-2")

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use v000000/NM-12B-Lyris-dev-2 with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "v000000/NM-12B-Lyris-dev-2"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "v000000/NM-12B-Lyris-dev-2",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker

docker model run hf.co/v000000/NM-12B-Lyris-dev-2

SGLang

How to use v000000/NM-12B-Lyris-dev-2 with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "v000000/NM-12B-Lyris-dev-2" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "v000000/NM-12B-Lyris-dev-2",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "v000000/NM-12B-Lyris-dev-2" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "v000000/NM-12B-Lyris-dev-2",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Docker Model Runner
How to use v000000/NM-12B-Lyris-dev-2 with Docker Model Runner:
```
docker model run hf.co/v000000/NM-12B-Lyris-dev-2
```
Browse Quantizations to use this model in llama.cpp, Ollama, LM Studio, or any compatible app.

Lyris-dev2-Mistral-Nemo-12B-2407

EXPERIMENTAL

attempt to fix Sao10k's Lyra-V3 prompt format and stop token >and boost smarts. with strategic LATCOS vector similarity merging

prototype, unfinished but works? Sometimes it does go on forever but it's way more useable, seems to have learnt to output stop token most of the time. But it's still pretty borked especially if greeting message is long. It needs even more Nemo-Instruct-2407 merged in.

Sao10K/MN-12B-Lyra-v1 Base
Sao10K/MN-12B-Lyra-v3 x2 Sequential PASS, order: 1, 3
unsloth/Mistral-Nemo-Instruct-2407 x1 Single PASS, order: 2
with z0.0001 value

Prompt format:

Mistral Instruct

[INST] System Message [/INST]

[INST] Name: Let's get started. Please respond based on the information and instructions provided above. [/INST]

<s>[INST] Name: What is your favourite condiment? [/INST]
AssistantName: Well, I'm quite partial to a good squeeze of fresh lemon juice. It adds just the right amount of zesty flavour to whatever I'm cooking up in the kitchen!</s> 
[INST] Name: Do you have mayonnaise recipes? [/INST]

Downloads last month: 2

Safetensors

Model size

12B params

Tensor type

F16

Model tree for v000000/NM-12B-Lyris-dev-2

Sao10K/MN-12B-Lyra-v1

Sao10K/MN-12B-Lyra-v3

unsloth/Mistral-Nemo-Instruct-2407

Merge model

this model

Quantizations

2 models