SDG SFT Round-1 LoRA Adapter (v0.1)

A LoRA adapter on Qwen/Qwen3.5-9B-Base that emits valid JSON compositions over the SDG (Synthesis Data Governance) 540-template ontology catalog under xgrammar-based constrained decoding.

Status: v0.1, peer-review preview. Curator: @zndx

What it is

Result of supervised fine-tuning on a 665-sample self-distilled corpus (zndx/sdg-bertopic-correspondence-v0.1 — corpus version v2). The corpus was generated by rejection-sampling the base model under xgrammar+full-schema constrained decoding and keeping only completions scoring R ≥ 0.3 against a C1-locked verifier.

Headline result

Held-out 50-scenario evaluation, mean R across 4 generations per scenario:

Stage	overall mean R	good_mean	bad_mean	R_A pass rate	AUC
Base (no adapter)	0.208	0.205	0.210	0.55	0.478
SFT-r1 (this adapter)	0.289	0.311	0.268	0.68	0.590

A clean +39 % overall improvement from a single SFT round, with symmetric gains on good (+51 %) and bad (+28 %) scenarios and a meaningful AUC lift (0.478 → 0.590) — the adapter slightly discriminates scenario quality, which the base model does not.

Training details

Hyperparameter	Value
Base model	`Qwen/Qwen3.5-9B-Base`
Trainable params	29.1M / 8.98B (0.32 %)
LoRA rank `r`	16
LoRA `alpha`	32
LoRA dropout	0.05
Target modules	`q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj`
Epochs	2
Grad accumulation	8
Per-device batch	1
Effective batch	16 (across 2 GPUs, FSDP FULL_SHARD)
Total grad steps	84
Final train loss	0.216
Final token accuracy	95.2 %
Final entropy	0.194
Wall time	69.6 min on 2× RTX 4090

Trained with accelerate launch --use_fsdp --num_processes 2, FSDP FULL_SHARD over Qwen3_5DecoderLayer. Hyperparameter precedents from [InstructGPT, Llama-2 RLHF].

Use

from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
import torch

tok = AutoModelForCausalLM.from_pretrained("Qwen/Qwen3.5-9B-Base")
model = AutoModelForCausalLM.from_pretrained(
    "Qwen/Qwen3.5-9B-Base", torch_dtype=torch.bfloat16
).to("cuda")
model = PeftModel.from_pretrained(model, "zndx/sdg-sft-r1")
model.eval()

# Recommended: use with xgrammar-based constrained decoding against
# the SDG composition JSON schema. See the project's
# `make_xgrammar_logits_processor_factory` helper for the canonical
# wiring.

Related artifacts

zndx/sdg-bertopic-correspondence-v0.1 — the SFT corpus + scoring + topic-alignment data.
zndx/sdg-sft-r2 — second SFT round, demonstrates diminishing returns + mode collapse.

Citation

@misc{sdg-sft-r1-v01,
  title  = {SDG SFT Round-1 LoRA Adapter (v0.1)},
  author = {Hill, Ryan and contributors},
  year   = {2026},
  url    = {https://huggingface.co/zndx/sdg-sft-r1}
}

Downloads last month: 16

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for zndx/sdg-sft-r1

Base model

Qwen/Qwen3.5-9B-Base

Adapter

(10)

this model