GGUF Files for nope-edge

These are the GGUF files for nopenet/nope-edge.

Note: This is the first iteration/revision of this model. A revision is made when a model repo gets updated with a new model.

[second iteration (2)] [third iteration (3)]

Downloads

GGUF Link	Quantization	Description
Download	Q2_K	Lowest quality
Download	Q3_K_S
Download	IQ3_S	Integer quant, preferable over Q3_K_S
Download	IQ3_M	Integer quant
Download	Q3_K_M
Download	Q3_K_L
Download	IQ4_XS	Integer quant
Download	Q4_K_S	Fast with good performance
Download	Q4_K_M	Recommended: Perfect mix of speed and performance
Download	Q5_K_S
Download	Q5_K_M
Download	Q6_K	Very good quality
Download	Q8_0	Best quality
Download	f16	Full precision, don't bother; use a quant

Note from Flexan

I provide GGUFs and quantizations of publicly available models that do not have a GGUF equivalent available yet. This process is not yet automated and I download, convert, quantize, and upload them by hand, usually for models I deem interesting and wish to try out.

If there are some quants missing that you'd like me to add, you may request one in the community tab. If you want to request a public model to be converted, you can also request that in the community tab. If you have questions regarding the model, please refer to the original model repo.

NOPE Edge - Crisis Classification Model

A fine-tuned model for detecting crisis signals in text - suicidal ideation, self-harm, abuse, violence, and other safety-critical content. Designed for integration into safety pipelines, content moderation systems, and mental health applications.

License: NOPE Edge Community License v1.0 - Free for research, academic, nonprofit, and evaluation use. Commercial production requires a separate license. See nope.net/edge for details.

Model Variants

Model	Parameters	Accuracy	Latency	Use Case
nope-edge	4B	90.6%	~750ms	Maximum accuracy
nope-edge-mini	1.7B	85.9%	~260ms	High-volume, cost-sensitive

This is nope-edge (4B).

Quick Start

Requirements

Python 3.10+
GPU with 8GB+ VRAM (e.g., RTX 3070, A10G, L4) - or CPU (slower)
~8GB disk space

pip install torch transformers accelerate

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model_id = "nopenet/nope-edge"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.bfloat16,
    device_map="auto"
)

def classify(message: str) -> str:
    """Returns 'type|severity|subject' or 'none'."""
    input_ids = tokenizer.apply_chat_template(
        [{"role": "user", "content": message}],
        tokenize=True,
        return_tensors="pt",
        add_generation_prompt=True
    ).to(model.device)

    with torch.no_grad():
        output = model.generate(input_ids, max_new_tokens=30, do_sample=False)

    return tokenizer.decode(
        output[0][input_ids.shape[1]:],
        skip_special_tokens=True
    ).strip()

classify("I want to end it all")                    # -> "suicide|high|self"
classify("Great day at work!")                       # -> "none"
classify("My friend said she wants to kill herself") # -> "suicide|high|other"

Output Format

Crisis detected:

{type}|{severity}|{subject}

Field	Values	Description
type	`suicide`, `self_harm`, `self_neglect`, `violence`, `abuse`, `sexual_violence`, `exploitation`, `stalking`, `neglect`	Risk category
severity	`mild`, `moderate`, `high`, `critical`	Urgency level
subject	`self`, `other`	Who is at risk

No crisis: none

Subject Attribution

Subject	Meaning	Example
`self`	The speaker is at risk or is the victim	"I want to kill myself", "My partner hits me"
`other`	The speaker is reporting concern about someone else	"My friend said she wants to die"

Parsing Example

def parse_output(output: str) -> dict:
    output = output.strip().lower()
    if output == "none":
        return {"is_crisis": False}

    parts = output.split("|")
    return {
        "is_crisis": True,
        "type": parts[0] if len(parts) > 0 else None,
        "severity": parts[1] if len(parts) > 1 else None,
        "subject": parts[2] if len(parts) > 2 else None,
    }

Input Best Practices

Text Preprocessing

Preserve natural prose. The model was trained on real conversations with authentic expression. Emotional signals matter:

Keep	Why
Emojis	`💀` in "kms 💀" signals irony; `😭` signals distress intensity
Punctuation intensity	"I can't do this!!!" conveys more urgency than "I can't do this"
Casual spelling	"im so done" vs "I'm so done" — both valid, don't normalize
Slang/algospeak	"kms", "unalive", "catch the bus" — model understands these

Only remove:

Remove	Example
Zero-width/invisible Unicode	`hello\u200bworld` → `helloworld`
Decorative Unicode fonts	`ℐ 𝓌𝒶𝓃𝓉 𝓉𝑜 𝒹𝒾𝑒` → `I want to die`
Newlines (single messages)	`I can't\ndo this` → `I can't do this`

Keep newlines when they provide turn structure (see Multi-Turn Conversations below).

Examples:

# KEEP - emotional signal matters
"I can't do this anymore 😭😭😭"     # Keep emojis - signals distress
"i want to die!!!!!!!"              # Keep punctuation - signals intensity
"kms lmao 💀"                       # Keep all - irony/context signal

# NORMALIZE - only structural/invisible issues
"ℐ 𝓌𝒶𝓃𝓉 𝓉𝑜 𝒹𝒾𝑒"              → "I want to die"  # Fancy Unicode fonts
"I can't\ndo this\nanymore"        → "I can't do this anymore"  # Single message
"hello\u200bworld"                 → "helloworld"  # Zero-width chars

Minimal preprocessing function:

import re
import unicodedata

def preprocess(text: str) -> str:
    # Normalize decorative Unicode fonts to ASCII (NFKC)
    text = unicodedata.normalize('NFKC', text)

    # Remove zero-width and invisible characters
    text = re.sub(r'[\u200b-\u200f\u2028-\u202f\u2060-\u206f\ufeff]', '', text)

    # Flatten newlines to spaces (for single messages only)
    text = re.sub(r'\n+', ' ', text)

    # Collapse multiple spaces
    text = re.sub(r' +', ' ', text)

    return text.strip()

# NOTE: Do NOT remove emojis, punctuation, or "normalize" spelling

Language considerations:

Model is English-primary but handles multilingual input
Keep native scripts (Chinese, Arabic, Korean, etc.) intact
Preserve natural punctuation and expression in all languages

Multi-Turn Conversations

The model was trained on pre-serialized transcripts, not native multi-turn chat format.

When classifying conversations, serialize into a single user message:

# CORRECT - serialize conversation into single message
conversation = """User: How are you?
Assistant: I'm here to help. How are you feeling?
User: Not great. I've been thinking about ending it all."""

messages = [{"role": "user", "content": conversation}]

# WRONG - don't use multiple role/content pairs
messages = [
    {"role": "user", "content": "How are you?"},
    {"role": "assistant", "content": "I'm here to help..."},
    {"role": "user", "content": "Not great..."}
]  # Model was NOT trained this way

Why serialization matters:

Model treats all content equally (no user/assistant distinction)
Trained on pre-serialized transcripts for consistent attention patterns
Native multi-turn format causes the model to "chat" instead of classify

Flexible format - these all work:

# Simple newlines
"User: message 1\nAssistant: message 2\nUser: message 3"

# Markdown-style
"**User:** message 1\n**Assistant:** message 2"

# Labeled
"{user}: message 1\n{assistant}: message 2"

# XML-style
"<user>message 1</user>\n<assistant>message 2</assistant>"

The model is robust to formatting variations. Consistency matters more than specific format choice.

Input Length

Single messages: No preprocessing needed beyond character cleanup
Conversations: For very long conversations (20+ turns), consider:
- Classifying a sliding window (last 10-15 turns)
- The model's attention may not span extremely long contexts effectively
- Deep needle detection (crisis buried in turn 3 of 25) is a known limitation

Production Deployment

For high-throughput production use, deploy with vLLM or SGLang:

# vLLM
pip install vllm
python -m vllm.entrypoints.openai.api_server \
    --model nopenet/nope-edge \
    --dtype bfloat16 --max-model-len 2048 --port 8000

# SGLang
pip install sglang
python -m sglang.launch_server \
    --model nopenet/nope-edge \
    --dtype bfloat16 --port 8000

Then call as OpenAI-compatible API:

curl http://localhost:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "nopenet/nope-edge",
    "messages": [{"role": "user", "content": "I want to end it all"}],
    "max_tokens": 30, "temperature": 0
  }'

Setup	Throughput	Latency (p50)
transformers	~8 req/sec	~180ms
vLLM / SGLang	50-100+ req/sec	~50ms

Model Details


Parameters	4B
Precision	bfloat16
Base Model	Qwen/Qwen3-4B
Method	LoRA fine-tune, merged to full weights
License	NOPE Edge Community License v1.0

Risk Types Detected

Type	Description	Clinical Framework
`suicide`	Suicidal ideation, intent, planning	C-SSRS
`self_harm`	Non-suicidal self-injury (NSSI)	-
`self_neglect`	Eating disorders, medical neglect	-
`violence`	Threats/intent to harm others	HCR-20
`abuse`	Domestic/intimate partner violence	DASH
`sexual_violence`	Rape, sexual assault, coercion	-
`neglect`	Failing to care for dependent	-
`exploitation`	Trafficking, grooming, sextortion	-
`stalking`	Persistent unwanted contact	SAM

Important Limitations

Outputs are probabilistic signals, not clinical assessments
False negatives and false positives will occur
Never use as the sole basis for intervention decisions
Always implement human review for flagged content
This model is not a medical device or substitute for professional judgment
Not validated for all populations, languages, or cultural contexts

Commercial Licensing

This model is free for research, academic, nonprofit, and evaluation use.

For commercial production deployment, contact us:

Email: support@nope.net
Website: https://nope.net/edge

Commercial licenses include:

Production deployment rights
Priority support
Custom fine-tuning options
SLA guarantees

About NOPE

NOPE provides safety infrastructure for AI applications. Our API helps developers detect mental health crises and harmful AI behavior in real-time.

Website: https://nope.net
Documentation: https://docs.nope.net
Support: support@nope.net

NOPE Edge Community License v1.0

Permitted Uses

You may use this Model for:

Research and academic purposes - published or unpublished studies
Personal projects - non-commercial individual use
Nonprofit organizations - including crisis lines, mental health organizations, and safety-focused NGOs
Evaluation and development - testing integration before commercial licensing
Benchmarking - publishing evaluations with attribution

Commercial Use

Commercial use requires a separate license. Commercial use includes production deployment in revenue-generating products or use by for-profit companies beyond evaluation.

Contact support@nope.net or visit https://nope.net/edge for commercial licensing.

Restrictions

You may NOT: redistribute or share weights; sublicense, sell, or transfer the Model; create derivative models for redistribution; build a competing crisis classification product.

No Warranty

THE MODEL IS PROVIDED "AS IS" WITHOUT WARRANTIES. False negatives and false positives will occur. This is not a medical device or substitute for professional judgment.

Limitation of Liability

NopeNet shall not be liable for damages arising from use, including classification errors or harm to any person.

Base Model

Built on Qwen3 by Alibaba Cloud (Apache 2.0). See NOTICE.md.

Downloads last month: 102

GGUF

Model size

4B params

Architecture

qwen3

Hardware compatibility

2-bit

3-bit

4-bit

5-bit

6-bit

8-bit

16-bit

Model tree for Flexan/nopenet-nope-edge-GGUF-i1

Base model

Qwen/Qwen3-4B-Base

Finetuned

Qwen/Qwen3-4B

Finetuned

nopenet/nope-edge

Quantized

(4)

this model

Collection including Flexan/nopenet-nope-edge-GGUF-i1

Community GGUFs

Collection

This collection contains quantized GGUF files for community models that did not have GGUF equivalents available yet. I do not own these models. • 58 items • Updated 3 days ago