SDG SFT Round-1 LoRA Adapter (v0.1)

A LoRA adapter on Qwen/Qwen3.5-9B-Base that emits valid JSON compositions over the SDG (Synthesis Data Governance) 540-template ontology catalog under xgrammar-based constrained decoding.

Status: v0.1, peer-review preview. Curator: @zndx

What it is

Result of supervised fine-tuning on a 665-sample self-distilled corpus (zndx/sdg-bertopic-correspondence-v0.1 โ€” corpus version v2). The corpus was generated by rejection-sampling the base model under xgrammar+full-schema constrained decoding and keeping only completions scoring R โ‰ฅ 0.3 against a C1-locked verifier.

Headline result

Held-out 50-scenario evaluation, mean R across 4 generations per scenario:

Stage overall mean R good_mean bad_mean R_A pass rate AUC
Base (no adapter) 0.208 0.205 0.210 0.55 0.478
SFT-r1 (this adapter) 0.289 0.311 0.268 0.68 0.590

A clean +39 % overall improvement from a single SFT round, with symmetric gains on good (+51 %) and bad (+28 %) scenarios and a meaningful AUC lift (0.478 โ†’ 0.590) โ€” the adapter slightly discriminates scenario quality, which the base model does not.

Training details

Hyperparameter Value
Base model Qwen/Qwen3.5-9B-Base
Trainable params 29.1M / 8.98B (0.32 %)
LoRA rank r 16
LoRA alpha 32
LoRA dropout 0.05
Target modules q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
Epochs 2
Grad accumulation 8
Per-device batch 1
Effective batch 16 (across 2 GPUs, FSDP FULL_SHARD)
Total grad steps 84
Final train loss 0.216
Final token accuracy 95.2 %
Final entropy 0.194
Wall time 69.6 min on 2ร— RTX 4090

Trained with accelerate launch --use_fsdp --num_processes 2, FSDP FULL_SHARD over Qwen3_5DecoderLayer. Hyperparameter precedents from [InstructGPT, Llama-2 RLHF].

Use

from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
import torch

tok = AutoModelForCausalLM.from_pretrained("Qwen/Qwen3.5-9B-Base")
model = AutoModelForCausalLM.from_pretrained(
    "Qwen/Qwen3.5-9B-Base", torch_dtype=torch.bfloat16
).to("cuda")
model = PeftModel.from_pretrained(model, "zndx/sdg-sft-r1")
model.eval()

# Recommended: use with xgrammar-based constrained decoding against
# the SDG composition JSON schema. See the project's
# `make_xgrammar_logits_processor_factory` helper for the canonical
# wiring.

Related artifacts

Citation

@misc{sdg-sft-r1-v01,
  title  = {SDG SFT Round-1 LoRA Adapter (v0.1)},
  author = {Hill, Ryan and contributors},
  year   = {2026},
  url    = {https://huggingface.co/zndx/sdg-sft-r1}
}
Downloads last month
16
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for zndx/sdg-sft-r1

Adapter
(10)
this model