Qwen3.5-2B Chat LoRA (Ultrachat SFT)
Overview
- LoRA fine-tuned model based on Qwen/Qwen3.5-2B
- Trained for chat-style instruction following
- Uses conversational formatting (<|user|>, <|assistant|>)
- Optimized for fast training with a reduced dataset subset
Important
- This repository contains LoRA adapter weights only
- Base model is NOT included
- Required base model: Qwen/Qwen3.5-2B
Usage
- Load base model and adapter
- Use chat template for proper responses
from transformers import AutoTokenizer, AutoModelForCausalLM
from peft import PeftModel
base_model = "Qwen/Qwen3.5-2B"
adapter = "your-username/your-repo-name"
tokenizer = AutoTokenizer.from_pretrained(base_model, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
base_model,
torch_dtype=torch.float16,
device_map="auto",
trust_remote_code=True
)
model = PeftModel.from_pretrained(model, adapter)
model.eval()
messages = [
{"role": "user", "content": "Explain gravity briefly"}
]
text = tokenizer.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True
)
inputs = tokenizer(text, return_tensors="pt").to(model.device)
outputs = model.generate(
**inputs,
max_new_tokens=400,
do_sample=True,
temperature=0.7,
top_p=0.9,
eos_token_id=tokenizer.eos_token_id
)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
if "</think>" in response:
response = response.split("</think>")[-1]
response = response.replace(text, "").strip()
print(response)
Training Details
- Base Model: Qwen/Qwen3.5-2B
- Method: LoRA (PEFT)
- Framework: Transformers + TRL (SFTTrainer)
- Dataset: HuggingFaceH4/ultrachat_200k (train_sft split)
- Data Used: 10,000 samples (subset for fast training)
- Task: Chat-style supervised fine-tuning
Hyperparameters
- Max Steps: 1000
- Max Sequence Length: 1024
- Batch Size: 4
- Gradient Accumulation: 4
- Learning Rate: 2e-4
- Precision: bfloat16
- Gradient Checkpointing: Enabled
Capabilities
- Multi-turn conversational responses
- Instruction-following chat behavior
- Handles both short and long-form answers
- Supports English and Hindi inputs (inherited from base model)
Limitations
- Trained on a subset (10k samples), not full dataset
- May produce inconsistent or incomplete responses
- Not fully aligned for safety or factual accuracy
- May still show verbosity or repetition
Example
- Input: What is gravity?
- Output: Gravity is a fundamental force that attracts objects with mass toward each other.
Model Type
- Architecture: Causal Language Model
- Task: Text Generation (Chat-style)
- Fine-tuning: LoRA Adapter
License
This model is released under the CC BY-NC 4.0 license.
- Downloads last month
- 91