Instructions to use zndx/sdg-sft-r1 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- PEFT
How to use zndx/sdg-sft-r1 with PEFT:
from peft import PeftModel from transformers import AutoModelForCausalLM base_model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen3.5-9B-Base") model = PeftModel.from_pretrained(base_model, "zndx/sdg-sft-r1") - Notebooks
- Google Colab
- Kaggle
SDG SFT Round-1 LoRA Adapter (v0.1)
A LoRA adapter on Qwen/Qwen3.5-9B-Base that emits valid JSON
compositions over the SDG (Synthesis Data Governance) 540-template
ontology catalog under xgrammar-based constrained decoding.
Status: v0.1, peer-review preview. Curator: @zndx
What it is
Result of supervised fine-tuning on a 665-sample self-distilled
corpus (zndx/sdg-bertopic-correspondence-v0.1 โ corpus version
v2). The corpus was generated by rejection-sampling the base model
under xgrammar+full-schema constrained decoding and keeping only
completions scoring R โฅ 0.3 against a C1-locked verifier.
Headline result
Held-out 50-scenario evaluation, mean R across 4 generations per scenario:
| Stage | overall mean R | good_mean | bad_mean | R_A pass rate | AUC |
|---|---|---|---|---|---|
| Base (no adapter) | 0.208 | 0.205 | 0.210 | 0.55 | 0.478 |
| SFT-r1 (this adapter) | 0.289 | 0.311 | 0.268 | 0.68 | 0.590 |
A clean +39 % overall improvement from a single SFT round, with symmetric gains on good (+51 %) and bad (+28 %) scenarios and a meaningful AUC lift (0.478 โ 0.590) โ the adapter slightly discriminates scenario quality, which the base model does not.
Training details
| Hyperparameter | Value |
|---|---|
| Base model | Qwen/Qwen3.5-9B-Base |
| Trainable params | 29.1M / 8.98B (0.32 %) |
LoRA rank r |
16 |
LoRA alpha |
32 |
| LoRA dropout | 0.05 |
| Target modules | q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj |
| Epochs | 2 |
| Grad accumulation | 8 |
| Per-device batch | 1 |
| Effective batch | 16 (across 2 GPUs, FSDP FULL_SHARD) |
| Total grad steps | 84 |
| Final train loss | 0.216 |
| Final token accuracy | 95.2 % |
| Final entropy | 0.194 |
| Wall time | 69.6 min on 2ร RTX 4090 |
Trained with accelerate launch --use_fsdp --num_processes 2, FSDP
FULL_SHARD over Qwen3_5DecoderLayer. Hyperparameter precedents from
[InstructGPT, Llama-2 RLHF].
Use
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
import torch
tok = AutoModelForCausalLM.from_pretrained("Qwen/Qwen3.5-9B-Base")
model = AutoModelForCausalLM.from_pretrained(
"Qwen/Qwen3.5-9B-Base", torch_dtype=torch.bfloat16
).to("cuda")
model = PeftModel.from_pretrained(model, "zndx/sdg-sft-r1")
model.eval()
# Recommended: use with xgrammar-based constrained decoding against
# the SDG composition JSON schema. See the project's
# `make_xgrammar_logits_processor_factory` helper for the canonical
# wiring.
Related artifacts
zndx/sdg-bertopic-correspondence-v0.1โ the SFT corpus + scoring + topic-alignment data.zndx/sdg-sft-r2โ second SFT round, demonstrates diminishing returns + mode collapse.
Citation
@misc{sdg-sft-r1-v01,
title = {SDG SFT Round-1 LoRA Adapter (v0.1)},
author = {Hill, Ryan and contributors},
year = {2026},
url = {https://huggingface.co/zndx/sdg-sft-r1}
}
- Downloads last month
- 16
Model tree for zndx/sdg-sft-r1
Base model
Qwen/Qwen3.5-9B-Base