<# Qwen3-4B-Instruct-2507-lora001>

This repository provides a LoRA adapter fine-tuned from Qwen/Qwen3-4B-Instruct-2507 using Unsloth + PEFT (LoRA).

This repository contains LoRA adapter weights only. The base model must be loaded separately.

Training Objective

This adapter is trained to improve structured output accuracy (JSON / YAML / XML / TOML / CSV). with an emphasis on returning only the requested structured output (no explanations, no markdown fences).

Key design choices implemented in the training code:

  • Assistant-only loss: loss is applied only to the final assistant output tokens, while the prompt/context is provided as input.
  • CoT masking (optional): when enabled, training can ignore intermediate reasoning and apply loss only after markers such as Output: / Final: / Answer:.

In addition, the training pipeline applies output cleaning and format-biased augmentation:

  • Removes <think>...</think> blocks when present
  • Strips markdown fences and leading “explanation” lines
  • Cuts to the first likely structure start ({, [, <, YAML/TOML-like starts)
  • Adds TOML-focused augmentation (strict instruction copies + TOML repair tasks)

Training Configuration

  • Base model: Qwen/Qwen3-4B-Instruct-2507
  • Method: LoRA (PEFT) with Unsloth loader
  • Max sequence length: 1536
  • Epochs: 2
  • Learning rate: 5e-05
  • LoRA: r=128, alpha=256
  • LoRA: r=128 (SFT_LORA_R), alpha=256 (SFT_LORA_ALPHA), dropout=0.05 (SFT_LORA_DROPOUT)
  • LoRA target modules (default):
    • q_proj,k_proj,v_proj,o_proj,gate_proj,up_proj,down_proj (SFT_LORA_TARGET_MODULES)
  • Eval/Save: every 200 steps (SFT_EVAL_STEPS, SFT_SAVE_STEPS)

Notes:

  • The script currently loads the base model with load_in_4bit=False (i.e., not QLoRA 4-bit by default).
  • BF16 is enabled in TrainingArguments (bf16=True).

Data / Split

  • Dataset: u-10bei/structured_data_with_cot_dataset_512_v5
  • Validation split: 0.03
  • Seed: 3407

Batch / Steps

  • Train batch size (per device): 6
  • Eval batch size (per device): 8
  • Gradient accumulation steps: 4
  • Effective batch size: 24 × (number_of_gpus) (effective = per_device_train_bs × grad_accum × num_gpus)
  • Max steps: -1 (epoch-based)
  • Logging steps: 20
  • Eval strategy: steps (eval_steps=200)
  • Save strategy: steps (save_steps=200, save_total_limit=6)

Optimization

  • LR scheduler: cosine
  • Warmup ratio: 0.03
  • Weight decay: 0.01

CoT masking (Output-only supervision)

  • mask_cot: enabled (SFT_MASK_COT=1)
  • output_markers: Output:, OUTPUT:, Final:, Answer:, Result:, Response:
  • output_learn_mode: after_marker
  • upsampling: disabled (SFT_USE_UPSAMPLING=0)

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
import torch

base = "Qwen/Qwen3-4B-Instruct-2507"
adapter = "your_id/your-repo"

tokenizer = AutoTokenizer.from_pretrained(base)
model = AutoModelForCausalLM.from_pretrained(
    base,
    torch_dtype=torch.float16,
    device_map="auto",
)
model = PeftModel.from_pretrained(model, adapter)

messages = [
    {
        "role": "user",
        "content": (
            "IMPORTANT:
"
            "- Output ONLY the requested TOML.
"
            "- No explanations, no markdown fences.
"
            "- Ensure the output parses correctly.

"
            "Task: Create a TOML config with title='demo' and ports=[8000,8001].

"
            "Output:
"
        )
    }
]

prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

with torch.no_grad():
    out = model.generate(
        **inputs,
        max_new_tokens=256,
        do_sample=False,
        temperature=0.0,
    )

print(tokenizer.decode(out[0][inputs["input_ids"].shape[-1]:], skip_special_tokens=True))

Sources & Terms (IMPORTANT)

Training data: u-10bei/structured_data_with_cot_dataset_512_v5

Dataset License: MIT License. This dataset is used and distributed under the terms of the MIT License. Compliance: Users must comply with the MIT license (including copyright notice) and the base model's original terms of use.

Downloads last month
-
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for kabuizuchi-trading/qwen3-4b-lora-structured

Adapter
(5490)
this model
Finetunes
1 model

Dataset used to train kabuizuchi-trading/qwen3-4b-lora-structured