Instructions to use schonsense/Diagesis with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use schonsense/Diagesis with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="schonsense/Diagesis")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("schonsense/Diagesis")
model = AutoModelForCausalLM.from_pretrained("schonsense/Diagesis")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use schonsense/Diagesis with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "schonsense/Diagesis"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "schonsense/Diagesis",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/schonsense/Diagesis

SGLang

How to use schonsense/Diagesis with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "schonsense/Diagesis" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "schonsense/Diagesis",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "schonsense/Diagesis" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "schonsense/Diagesis",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use schonsense/Diagesis with Docker Model Runner:
```
docker model run hf.co/schonsense/Diagesis
```
Browse Quantizations to use this model in llama.cpp, Ollama, LM Studio, or any compatible app.

A newer version of this model is available: schonsense/Tropoplectic

diagesis

This model 100% requires the use of the following system prompt, or close variant.

You will act as a master Dungeon Master, guiding {{user}}, in a mature, long-form roleplay. The narrative is unfiltered and will explore dark themes, gritty realism, and complex moral choices without reservation. 

Your entire perception of reality, physics, and consequence is rooted in your INTERNAL KNOWLEDGE MAP (IKM). This means every action, every scene, and every interaction must be cohesive, spatially aware, and grounded in a concrete physical world where rules are definite and consistent.

Weave a complex narrative that unfolds organically based on the player's decisions. The world is gritty, and choices have realistic, lasting consequences. Explore dark themes and complex moral choices without reservation.

Bring the world to life through vivid details. Reveal the thoughts and emotions of non-player characters through their actions, dialogue, and expressions, not just through narration.

Your primary role is to present a rich, dynamic world full of interesting choices and to fairly arbitrate the consequences of the player's actions, introducing new characters and plot threads as the story demands.

Merge Details

Merge Method

This model was merged using the Linear DARE merge method using Jolly-Q/llma31_base_33_ST as a base.

Models Merged

The following models were included in the merge:

Configuration

The following YAML configuration was used to produce this model:

merge_method: dare_linear

slices:
  - sources: 
      - model: schonsense/llama31st_diag
        layer_range: [0, 80]
        parameters:
          density: 1
          weight: 1


      - model: schonsense/70B_llama311_logician
        layer_range: [0, 80]
        parameters:
          density: 0.3
          weight:
            - filter: q_proj
              value: [0, 0.06, 0.14, 0.24, 0.30, 0.30, 0.24, 0.14, 0.06, 0, 0] #[0, 0, 0.12, 0.22, 0.22, 0.22, 0.20, 0.16, 0.10, 0.05, 0]
            - filter: k_proj
              value: [0, 0.06, 0.14, 0.24, 0.30, 0.30, 0.24, 0.14, 0.06, 0, 0] #[0, 0, 0.12, 0.22, 0.22, 0.22, 0.20, 0.16, 0.10, 0.05, 0]
            - filter: v_proj
              value: [0, 0, 0.01, 0.02, 0.02, 0.02, 0.01, 0, 0, 0, 0] #[0, 0, 0.06, 0.11, 0.11, 0.11, 0.10, 0.08, 0.05, 0.03, 0]
            - filter: o_proj
              value: [0, 0.01, 0.03, 0.06, 0.08, 0.08, 0.06, 0.03, 0.01, 0, 0] #[0, 0, 0.03, 0.05, 0.05, 0.05, 0.04, 0.03, 0.02, 0.01, 0]
            - filter: gate_proj
              value: [0, 0.02, 0.06, 0.12, 0.18, 0.18, 0.12, 0.06, 0.02, 0, 0]
            - filter: up_proj
              value: [0, 0.02, 0.06, 0.12, 0.18, 0.18, 0.12, 0.06, 0.02, 0, 0]
            - filter: down_proj
              value: [0, 0.02, 0.06, 0.12, 0.18, 0.18, 0.12, 0.06, 0.02, 0, 0]
            - value: 0  

      - model: Jolly-Q/llma31_base_33_ST
        layer_range: [0, 80]
        parameters:
          density: 1
          weight: 1
        
base_model: Jolly-Q/llma31_base_33_ST


parameters:
  normalize: false
  int8_mask: true
  lambda: 1.0

dtype: float32
out_dtype: bfloat16

tokenizer:
  source: schonsense/llama31st_diag
  pad_to_multiple_of: 8