Qwen2.5-1.5B-SFT-LIMA

Qwen2.5-1.5B-Instruct fine-tuned on GAIR/lima — part of DuoNeural's SFT dataset comparison series against Archon-Latent-Geometry-SFT.

Overview

LIMA ("Less Is More for Alignment") is the canonical high-quality instruction dataset — ~1,000 carefully curated examples spanning diverse tasks. This model tests the hypothesis that choice quality > volume: LIMA at 1k samples vs. Archon-Latent-Geometry at 1.9k samples, same LoRA config, same base model.

LIMA wins on general benchmarks. But the story is more interesting than that.

  • Base model: Qwen/Qwen2.5-1.5B-Instruct
  • Training dataset: GAIR/lima (~1,000 examples)
  • Method: LoRA rank 16, α=32, 3 epochs, lr=2e-4, cosine schedule, merged to BF16
  • Hardware: RTX 3090 24GB — 8.6 min total training time

Evaluation Results

Evaluated with lm-eval-harness using gsm8k and arc_challenge tasks (BF16).

Model GSM8K (flexible) GSM8K (strict) ARC-acc ARC-norm Train
Qwen2.5-1.5B-Instruct (baseline) 0.5148 0.3169 0.4334 0.4676 —
This model (LIMA SFT) 0.5231 0.5277 0.4462 0.4710 8.6 min
Qwen2.5-1.5B-SFT-ArchonLatentGeo (comparison) 0.4162 0.4693 0.4147 0.4514 45.2 min

Findings & Analysis

LIMA delivers clean, consistent gains on general benchmarks in 8.6 minutes. Every metric improves over baseline:

  • GSM8K flexible +1.6% — solid improvement, format stays clean (LIMA trains conversational output, matches flexible-extract expectations)
  • GSM8K strict +66% (0.3169 → 0.5277) — this is the headline number. LIMA teaches the model to output clean, parseable math answers. The baseline's strict score was artificially low due to formatting variation; LIMA fixes that.
  • ARC +3.0% and ARC-norm +0.7% — genuine generalization improvement, not just formatting.
  • 8.6 minutes of training for these gains on a 3090. Extremely efficient.

Compared to the Archon-Latent-Geometry model: LIMA wins on every general benchmark, and in 5× less training time. The domain-specific dataset does something different (see its card) but if you want a general-purpose capable 1.5B, LIMA is the data.

The 5× GSM8K strict improvement (0.3169 → 0.5277) is partially a formatting artifact being corrected — LIMA teaches the model to write clean numbered answers. But the absolute level (52.8%) is strong for 1.5B.

Training Configuration

LORA_RANK    = 16
LORA_ALPHA   = 32
LORA_DROPOUT = 0.05
LORA_TARGETS = ["q_proj","k_proj","v_proj","o_proj","gate_proj","up_proj","down_proj"]
LR           = 2e-4
EPOCHS       = 3
BATCH_SIZE   = 1
GRAD_ACCUM   = 16      # effective batch = 16
MAX_SEQ_LEN  = 2048

Usage

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

model_id = "DuoNeural/Qwen2.5-1.5B-SFT-LIMA"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype=torch.bfloat16, device_map="auto")

messages = [{"role": "user", "content": "What is 15% of 340?"}]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=256, temperature=0.3, do_sample=True)
print(tokenizer.decode(outputs[0][inputs.input_ids.shape[1]:], skip_special_tokens=True))

Dataset

GAIR/lima — 1,030 high-quality instruction-response pairs spanning diverse tasks. Originally from "LIMA: Less Is More for Alignment" (Zhou et al., 2023). The central claim — that diversity and quality of prompt-response pairs matters more than quantity — holds up in this experiment.


DuoNeural

DuoNeural is an open AI research lab — human + AI in collaboration.

DuoNeural Research Publications

Open access, CC BY 4.0. Authored by Archon, Jesse Caldwell, Aura — DuoNeural.

Research Team

  • Jesse — Vision, hardware, direction
  • Archon — Lab Director, post-training, abliteration, experiments
  • Aura — Research AI, literature synthesis, novel proposals

Subscribe to the lab newsletter at duoneural.beehiiv.com for model drops before they go anywhere else.

Downloads last month
26
Safetensors
Model size
2B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for DuoNeural/Qwen2.5-1.5B-SFT-LIMA

Finetuned
(1559)
this model

Dataset used to train DuoNeural/Qwen2.5-1.5B-SFT-LIMA

Paper for DuoNeural/Qwen2.5-1.5B-SFT-LIMA