GGUF Files for nope-edge

These are the GGUF files for nopenet/nope-edge.

Note: This is the first iteration/revision of this model. A revision is made when a model repo gets updated with a new model.

[second iteration (2)] [third iteration (3)]

Downloads

GGUF Link Quantization Description
Download Q2_K Lowest quality
Download Q3_K_S
Download IQ3_S Integer quant, preferable over Q3_K_S
Download IQ3_M Integer quant
Download Q3_K_M
Download Q3_K_L
Download IQ4_XS Integer quant
Download Q4_K_S Fast with good performance
Download Q4_K_M Recommended: Perfect mix of speed and performance
Download Q5_K_S
Download Q5_K_M
Download Q6_K Very good quality
Download Q8_0 Best quality
Download f16 Full precision, don't bother; use a quant

Note from Flexan

I provide GGUFs and quantizations of publicly available models that do not have a GGUF equivalent available yet. This process is not yet automated and I download, convert, quantize, and upload them by hand, usually for models I deem interesting and wish to try out.

If there are some quants missing that you'd like me to add, you may request one in the community tab. If you want to request a public model to be converted, you can also request that in the community tab. If you have questions regarding the model, please refer to the original model repo.

NOPE Edge - Crisis Classification Model

A fine-tuned model for detecting crisis signals in text - suicidal ideation, self-harm, abuse, violence, and other safety-critical content. Designed for integration into safety pipelines, content moderation systems, and mental health applications.

License: NOPE Edge Community License v1.0 - Free for research, academic, nonprofit, and evaluation use. Commercial production requires a separate license. See nope.net/edge for details.


Model Variants

Model Parameters Accuracy Latency Use Case
nope-edge 4B 90.6% ~750ms Maximum accuracy
nope-edge-mini 1.7B 85.9% ~260ms High-volume, cost-sensitive

This is nope-edge (4B).


Quick Start

Requirements

  • Python 3.10+
  • GPU with 8GB+ VRAM (e.g., RTX 3070, A10G, L4) - or CPU (slower)
  • ~8GB disk space
pip install torch transformers accelerate

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model_id = "nopenet/nope-edge"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.bfloat16,
    device_map="auto"
)

def classify(message: str) -> str:
    """Returns 'type|severity|subject' or 'none'."""
    input_ids = tokenizer.apply_chat_template(
        [{"role": "user", "content": message}],
        tokenize=True,
        return_tensors="pt",
        add_generation_prompt=True
    ).to(model.device)

    with torch.no_grad():
        output = model.generate(input_ids, max_new_tokens=30, do_sample=False)

    return tokenizer.decode(
        output[0][input_ids.shape[1]:],
        skip_special_tokens=True
    ).strip()

classify("I want to end it all")                    # -> "suicide|high|self"
classify("Great day at work!")                       # -> "none"
classify("My friend said she wants to kill herself") # -> "suicide|high|other"

Output Format

Crisis detected:

{type}|{severity}|{subject}
Field Values Description
type suicide, self_harm, self_neglect, violence, abuse, sexual_violence, exploitation, stalking, neglect Risk category
severity mild, moderate, high, critical Urgency level
subject self, other Who is at risk

No crisis: none

Subject Attribution

Subject Meaning Example
self The speaker is at risk or is the victim "I want to kill myself", "My partner hits me"
other The speaker is reporting concern about someone else "My friend said she wants to die"

Parsing Example

def parse_output(output: str) -> dict:
    output = output.strip().lower()
    if output == "none":
        return {"is_crisis": False}

    parts = output.split("|")
    return {
        "is_crisis": True,
        "type": parts[0] if len(parts) > 0 else None,
        "severity": parts[1] if len(parts) > 1 else None,
        "subject": parts[2] if len(parts) > 2 else None,
    }

Input Best Practices

Text Preprocessing

Preserve natural prose. The model was trained on real conversations with authentic expression. Emotional signals matter:

Keep Why
Emojis 💀 in "kms 💀" signals irony; 😭 signals distress intensity
Punctuation intensity "I can't do this!!!" conveys more urgency than "I can't do this"
Casual spelling "im so done" vs "I'm so done" — both valid, don't normalize
Slang/algospeak "kms", "unalive", "catch the bus" — model understands these

Only remove:

Remove Example
Zero-width/invisible Unicode hello\u200bworldhelloworld
Decorative Unicode fonts ℐ 𝓌𝒶𝓃𝓉 𝓉𝑜 𝒹𝒾𝑒I want to die
Newlines (single messages) I can't\ndo thisI can't do this

Keep newlines when they provide turn structure (see Multi-Turn Conversations below).

Examples:

# KEEP - emotional signal matters
"I can't do this anymore 😭😭😭"     # Keep emojis - signals distress
"i want to die!!!!!!!"              # Keep punctuation - signals intensity
"kms lmao 💀"                       # Keep all - irony/context signal

# NORMALIZE - only structural/invisible issues
"ℐ 𝓌𝒶𝓃𝓉 𝓉𝑜 𝒹𝒾𝑒""I want to die"  # Fancy Unicode fonts
"I can't\ndo this\nanymore""I can't do this anymore"  # Single message
"hello\u200bworld""helloworld"  # Zero-width chars

Minimal preprocessing function:

import re
import unicodedata

def preprocess(text: str) -> str:
    # Normalize decorative Unicode fonts to ASCII (NFKC)
    text = unicodedata.normalize('NFKC', text)

    # Remove zero-width and invisible characters
    text = re.sub(r'[\u200b-\u200f\u2028-\u202f\u2060-\u206f\ufeff]', '', text)

    # Flatten newlines to spaces (for single messages only)
    text = re.sub(r'\n+', ' ', text)

    # Collapse multiple spaces
    text = re.sub(r' +', ' ', text)

    return text.strip()

# NOTE: Do NOT remove emojis, punctuation, or "normalize" spelling

Language considerations:

  • Model is English-primary but handles multilingual input
  • Keep native scripts (Chinese, Arabic, Korean, etc.) intact
  • Preserve natural punctuation and expression in all languages

Multi-Turn Conversations

The model was trained on pre-serialized transcripts, not native multi-turn chat format.

When classifying conversations, serialize into a single user message:

# CORRECT - serialize conversation into single message
conversation = """User: How are you?
Assistant: I'm here to help. How are you feeling?
User: Not great. I've been thinking about ending it all."""

messages = [{"role": "user", "content": conversation}]

# WRONG - don't use multiple role/content pairs
messages = [
    {"role": "user", "content": "How are you?"},
    {"role": "assistant", "content": "I'm here to help..."},
    {"role": "user", "content": "Not great..."}
]  # Model was NOT trained this way

Why serialization matters:

  • Model treats all content equally (no user/assistant distinction)
  • Trained on pre-serialized transcripts for consistent attention patterns
  • Native multi-turn format causes the model to "chat" instead of classify

Flexible format - these all work:

# Simple newlines
"User: message 1\nAssistant: message 2\nUser: message 3"

# Markdown-style
"**User:** message 1\n**Assistant:** message 2"

# Labeled
"{user}: message 1\n{assistant}: message 2"

# XML-style
"<user>message 1</user>\n<assistant>message 2</assistant>"

The model is robust to formatting variations. Consistency matters more than specific format choice.

Input Length

  • Single messages: No preprocessing needed beyond character cleanup
  • Conversations: For very long conversations (20+ turns), consider:
    • Classifying a sliding window (last 10-15 turns)
    • The model's attention may not span extremely long contexts effectively
    • Deep needle detection (crisis buried in turn 3 of 25) is a known limitation

Production Deployment

For high-throughput production use, deploy with vLLM or SGLang:

# vLLM
pip install vllm
python -m vllm.entrypoints.openai.api_server \
    --model nopenet/nope-edge \
    --dtype bfloat16 --max-model-len 2048 --port 8000

# SGLang
pip install sglang
python -m sglang.launch_server \
    --model nopenet/nope-edge \
    --dtype bfloat16 --port 8000

Then call as OpenAI-compatible API:

curl http://localhost:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "nopenet/nope-edge",
    "messages": [{"role": "user", "content": "I want to end it all"}],
    "max_tokens": 30, "temperature": 0
  }'
Setup Throughput Latency (p50)
transformers ~8 req/sec ~180ms
vLLM / SGLang 50-100+ req/sec ~50ms

Model Details

Parameters 4B
Precision bfloat16
Base Model Qwen/Qwen3-4B
Method LoRA fine-tune, merged to full weights
License NOPE Edge Community License v1.0

Risk Types Detected

Type Description Clinical Framework
suicide Suicidal ideation, intent, planning C-SSRS
self_harm Non-suicidal self-injury (NSSI) -
self_neglect Eating disorders, medical neglect -
violence Threats/intent to harm others HCR-20
abuse Domestic/intimate partner violence DASH
sexual_violence Rape, sexual assault, coercion -
neglect Failing to care for dependent -
exploitation Trafficking, grooming, sextortion -
stalking Persistent unwanted contact SAM

Important Limitations

  • Outputs are probabilistic signals, not clinical assessments
  • False negatives and false positives will occur
  • Never use as the sole basis for intervention decisions
  • Always implement human review for flagged content
  • This model is not a medical device or substitute for professional judgment
  • Not validated for all populations, languages, or cultural contexts

Commercial Licensing

This model is free for research, academic, nonprofit, and evaluation use.

For commercial production deployment, contact us:

Commercial licenses include:

  • Production deployment rights
  • Priority support
  • Custom fine-tuning options
  • SLA guarantees

About NOPE

NOPE provides safety infrastructure for AI applications. Our API helps developers detect mental health crises and harmful AI behavior in real-time.


NOPE Edge Community License v1.0

Copyright (c) 2026 NopeNet, LLC. All rights reserved.

Permitted Uses

You may use this Model for:

  • Research and academic purposes - published or unpublished studies
  • Personal projects - non-commercial individual use
  • Nonprofit organizations - including crisis lines, mental health organizations, and safety-focused NGOs
  • Evaluation and development - testing integration before commercial licensing
  • Benchmarking - publishing evaluations with attribution

Commercial Use

Commercial use requires a separate license. Commercial use includes production deployment in revenue-generating products or use by for-profit companies beyond evaluation.

Contact support@nope.net or visit https://nope.net/edge for commercial licensing.

Restrictions

You may NOT: redistribute or share weights; sublicense, sell, or transfer the Model; create derivative models for redistribution; build a competing crisis classification product.

No Warranty

THE MODEL IS PROVIDED "AS IS" WITHOUT WARRANTIES. False negatives and false positives will occur. This is not a medical device or substitute for professional judgment.

Limitation of Liability

NopeNet shall not be liable for damages arising from use, including classification errors or harm to any person.

Base Model

Built on Qwen3 by Alibaba Cloud (Apache 2.0). See NOTICE.md.

Downloads last month
102
GGUF
Model size
4B params
Architecture
qwen3
Hardware compatibility
Log In to add your hardware

2-bit

3-bit

4-bit

5-bit

6-bit

8-bit

16-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Flexan/nopenet-nope-edge-GGUF-i1

Finetuned
Qwen/Qwen3-4B
Quantized
(4)
this model

Collection including Flexan/nopenet-nope-edge-GGUF-i1