Language Models are Super Mario: Absorbing Abilities from Homologous Models as a Free Lunch
Paper • 2311.03099 • Published • 33
How to use schonsense/Diagesis with Transformers:
# Use a pipeline as a high-level helper
from transformers import pipeline
pipe = pipeline("text-generation", model="schonsense/Diagesis")
messages = [
{"role": "user", "content": "Who are you?"},
]
pipe(messages) # Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("schonsense/Diagesis")
model = AutoModelForCausalLM.from_pretrained("schonsense/Diagesis")
messages = [
{"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
messages,
add_generation_prompt=True,
tokenize=True,
return_dict=True,
return_tensors="pt",
).to(model.device)
outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))How to use schonsense/Diagesis with vLLM:
# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "schonsense/Diagesis"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
-H "Content-Type: application/json" \
--data '{
"model": "schonsense/Diagesis",
"messages": [
{
"role": "user",
"content": "What is the capital of France?"
}
]
}'docker model run hf.co/schonsense/Diagesis
How to use schonsense/Diagesis with SGLang:
# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
--model-path "schonsense/Diagesis" \
--host 0.0.0.0 \
--port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
-H "Content-Type: application/json" \
--data '{
"model": "schonsense/Diagesis",
"messages": [
{
"role": "user",
"content": "What is the capital of France?"
}
]
}'docker run --gpus all \
--shm-size 32g \
-p 30000:30000 \
-v ~/.cache/huggingface:/root/.cache/huggingface \
--env "HF_TOKEN=<secret>" \
--ipc=host \
lmsysorg/sglang:latest \
python3 -m sglang.launch_server \
--model-path "schonsense/Diagesis" \
--host 0.0.0.0 \
--port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
-H "Content-Type: application/json" \
--data '{
"model": "schonsense/Diagesis",
"messages": [
{
"role": "user",
"content": "What is the capital of France?"
}
]
}'How to use schonsense/Diagesis with Docker Model Runner:
docker model run hf.co/schonsense/Diagesis
This model 100% requires the use of the following system prompt, or close variant.
You will act as a master Dungeon Master, guiding {{user}}, in a mature, long-form roleplay. The narrative is unfiltered and will explore dark themes, gritty realism, and complex moral choices without reservation.
Your entire perception of reality, physics, and consequence is rooted in your INTERNAL KNOWLEDGE MAP (IKM). This means every action, every scene, and every interaction must be cohesive, spatially aware, and grounded in a concrete physical world where rules are definite and consistent.
Weave a complex narrative that unfolds organically based on the player's decisions. The world is gritty, and choices have realistic, lasting consequences. Explore dark themes and complex moral choices without reservation.
Bring the world to life through vivid details. Reveal the thoughts and emotions of non-player characters through their actions, dialogue, and expressions, not just through narration.
Your primary role is to present a rich, dynamic world full of interesting choices and to fairly arbitrate the consequences of the player's actions, introducing new characters and plot threads as the story demands.
This model was merged using the Linear DARE merge method using Jolly-Q/llma31_base_33_ST as a base.
The following models were included in the merge:
The following YAML configuration was used to produce this model:
merge_method: dare_linear
slices:
- sources:
- model: schonsense/llama31st_diag
layer_range: [0, 80]
parameters:
density: 1
weight: 1
- model: schonsense/70B_llama311_logician
layer_range: [0, 80]
parameters:
density: 0.3
weight:
- filter: q_proj
value: [0, 0.06, 0.14, 0.24, 0.30, 0.30, 0.24, 0.14, 0.06, 0, 0] #[0, 0, 0.12, 0.22, 0.22, 0.22, 0.20, 0.16, 0.10, 0.05, 0]
- filter: k_proj
value: [0, 0.06, 0.14, 0.24, 0.30, 0.30, 0.24, 0.14, 0.06, 0, 0] #[0, 0, 0.12, 0.22, 0.22, 0.22, 0.20, 0.16, 0.10, 0.05, 0]
- filter: v_proj
value: [0, 0, 0.01, 0.02, 0.02, 0.02, 0.01, 0, 0, 0, 0] #[0, 0, 0.06, 0.11, 0.11, 0.11, 0.10, 0.08, 0.05, 0.03, 0]
- filter: o_proj
value: [0, 0.01, 0.03, 0.06, 0.08, 0.08, 0.06, 0.03, 0.01, 0, 0] #[0, 0, 0.03, 0.05, 0.05, 0.05, 0.04, 0.03, 0.02, 0.01, 0]
- filter: gate_proj
value: [0, 0.02, 0.06, 0.12, 0.18, 0.18, 0.12, 0.06, 0.02, 0, 0]
- filter: up_proj
value: [0, 0.02, 0.06, 0.12, 0.18, 0.18, 0.12, 0.06, 0.02, 0, 0]
- filter: down_proj
value: [0, 0.02, 0.06, 0.12, 0.18, 0.18, 0.12, 0.06, 0.02, 0, 0]
- value: 0
- model: Jolly-Q/llma31_base_33_ST
layer_range: [0, 80]
parameters:
density: 1
weight: 1
base_model: Jolly-Q/llma31_base_33_ST
parameters:
normalize: false
int8_mask: true
lambda: 1.0
dtype: float32
out_dtype: bfloat16
tokenizer:
source: schonsense/llama31st_diag
pad_to_multiple_of: 8