＜# Qwen3-4B-Instruct-2507-lora001＞

This repository provides a LoRA adapter fine-tuned from Qwen/Qwen3-4B-Instruct-2507 using Unsloth + PEFT (LoRA).

This repository contains LoRA adapter weights only. The base model must be loaded separately.

Training Objective

This adapter is trained to improve structured output accuracy (JSON / YAML / XML / TOML / CSV). with an emphasis on returning only the requested structured output (no explanations, no markdown fences).

Key design choices implemented in the training code:

Assistant-only loss: loss is applied only to the final assistant output tokens, while the prompt/context is provided as input.
CoT masking (optional): when enabled, training can ignore intermediate reasoning and apply loss only after markers such as Output: / Final: / Answer:.

In addition, the training pipeline applies output cleaning and format-biased augmentation:

Removes <think>...</think> blocks when present
Strips markdown fences and leading “explanation” lines
Cuts to the first likely structure start ({, [, <, YAML/TOML-like starts)
Adds TOML-focused augmentation (strict instruction copies + TOML repair tasks)

Training Configuration

Base model: Qwen/Qwen3-4B-Instruct-2507
Method: LoRA (PEFT) with Unsloth loader
Max sequence length: 1536
Epochs: 2
Learning rate: 5e-05
LoRA: r=128, alpha=256
LoRA: r=128 (SFT_LORA_R), alpha=256 (SFT_LORA_ALPHA), dropout=0.05 (SFT_LORA_DROPOUT)
LoRA target modules (default):
- q_proj,k_proj,v_proj,o_proj,gate_proj,up_proj,down_proj (SFT_LORA_TARGET_MODULES)
Eval/Save: every 200 steps (SFT_EVAL_STEPS, SFT_SAVE_STEPS)

Notes:

The script currently loads the base model with load_in_4bit=False (i.e., not QLoRA 4-bit by default).
BF16 is enabled in TrainingArguments (bf16=True).

Data / Split

Dataset: u-10bei/structured_data_with_cot_dataset_512_v5
Validation split: 0.03
Seed: 3407

Batch / Steps

Train batch size (per device): 6
Eval batch size (per device): 8
Gradient accumulation steps: 4
Effective batch size: 24 × (number_of_gpus) (effective = per_device_train_bs × grad_accum × num_gpus)
Max steps: -1 (epoch-based)
Logging steps: 20
Eval strategy: steps (eval_steps=200)
Save strategy: steps (save_steps=200, save_total_limit=6)

Optimization

LR scheduler: cosine
Warmup ratio: 0.03
Weight decay: 0.01

CoT masking (Output-only supervision)

mask_cot: enabled (SFT_MASK_COT=1)
output_markers: Output:, OUTPUT:, Final:, Answer:, Result:, Response:
output_learn_mode: after_marker
upsampling: disabled (SFT_USE_UPSAMPLING=0)

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
import torch

base = "Qwen/Qwen3-4B-Instruct-2507"
adapter = "your_id/your-repo"

tokenizer = AutoTokenizer.from_pretrained(base)
model = AutoModelForCausalLM.from_pretrained(
    base,
    torch_dtype=torch.float16,
    device_map="auto",
)
model = PeftModel.from_pretrained(model, adapter)

messages = [
    {
        "role": "user",
        "content": (
            "IMPORTANT:
"
            "- Output ONLY the requested TOML.
"
            "- No explanations, no markdown fences.
"
            "- Ensure the output parses correctly.

"
            "Task: Create a TOML config with title='demo' and ports=[8000,8001].

"
            "Output:
"
        )
    }
]

prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

with torch.no_grad():
    out = model.generate(
        **inputs,
        max_new_tokens=256,
        do_sample=False,
        temperature=0.0,
    )

print(tokenizer.decode(out[0][inputs["input_ids"].shape[-1]:], skip_special_tokens=True))

Sources & Terms (IMPORTANT)

Training data: u-10bei/structured_data_with_cot_dataset_512_v5

Dataset License: MIT License. This dataset is used and distributed under the terms of the MIT License. Compliance: Users must comply with the MIT license (including copyright notice) and the base model's original terms of use.

Downloads last month: -

Model tree for kabuizuchi-trading/qwen3-4b-lora-structured

Base model

Qwen/Qwen3-4B-Instruct-2507

Adapter

(5490)

this model

Finetunes

1 model

kabuizuchi-trading
/

qwen3-4b-lora-structured