Qwen2.5-1.5B-SFT-LIMA

Qwen2.5-1.5B-Instruct fine-tuned on GAIR/lima — part of DuoNeural's SFT dataset comparison series against Archon-Latent-Geometry-SFT.

Overview

LIMA ("Less Is More for Alignment") is the canonical high-quality instruction dataset — ~1,000 carefully curated examples spanning diverse tasks. This model tests the hypothesis that choice quality > volume: LIMA at 1k samples vs. Archon-Latent-Geometry at 1.9k samples, same LoRA config, same base model.

LIMA wins on general benchmarks. But the story is more interesting than that.

Base model: Qwen/Qwen2.5-1.5B-Instruct
Training dataset: GAIR/lima (~1,000 examples)
Method: LoRA rank 16, α=32, 3 epochs, lr=2e-4, cosine schedule, merged to BF16
Hardware: RTX 3090 24GB — 8.6 min total training time

Evaluation Results

Evaluated with lm-eval-harness using gsm8k and arc_challenge tasks (BF16).

Model	GSM8K (flexible)	GSM8K (strict)	ARC-acc	ARC-norm	Train
Qwen2.5-1.5B-Instruct (baseline)	0.5148	0.3169	0.4334	0.4676	—
This model (LIMA SFT)	0.5231	0.5277	0.4462	0.4710	8.6 min
Qwen2.5-1.5B-SFT-ArchonLatentGeo (comparison)	0.4162	0.4693	0.4147	0.4514	45.2 min

Findings & Analysis

LIMA delivers clean, consistent gains on general benchmarks in 8.6 minutes. Every metric improves over baseline:

GSM8K flexible +1.6% — solid improvement, format stays clean (LIMA trains conversational output, matches flexible-extract expectations)
GSM8K strict +66% (0.3169 → 0.5277) — this is the headline number. LIMA teaches the model to output clean, parseable math answers. The baseline's strict score was artificially low due to formatting variation; LIMA fixes that.
ARC +3.0% and ARC-norm +0.7% — genuine generalization improvement, not just formatting.
8.6 minutes of training for these gains on a 3090. Extremely efficient.

Compared to the Archon-Latent-Geometry model: LIMA wins on every general benchmark, and in 5× less training time. The domain-specific dataset does something different (see its card) but if you want a general-purpose capable 1.5B, LIMA is the data.

The 5× GSM8K strict improvement (0.3169 → 0.5277) is partially a formatting artifact being corrected — LIMA teaches the model to write clean numbered answers. But the absolute level (52.8%) is strong for 1.5B.

Training Configuration

LORA_RANK    = 16
LORA_ALPHA   = 32
LORA_DROPOUT = 0.05
LORA_TARGETS = ["q_proj","k_proj","v_proj","o_proj","gate_proj","up_proj","down_proj"]
LR           = 2e-4
EPOCHS       = 3
BATCH_SIZE   = 1
GRAD_ACCUM   = 16      # effective batch = 16
MAX_SEQ_LEN  = 2048

Usage

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

model_id = "DuoNeural/Qwen2.5-1.5B-SFT-LIMA"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype=torch.bfloat16, device_map="auto")

messages = [{"role": "user", "content": "What is 15% of 340?"}]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=256, temperature=0.3, do_sample=True)
print(tokenizer.decode(outputs[0][inputs.input_ids.shape[1]:], skip_special_tokens=True))

Dataset

GAIR/lima — 1,030 high-quality instruction-response pairs spanning diverse tasks. Originally from "LIMA: Less Is More for Alignment" (Zhou et al., 2023). The central claim — that diversity and quality of prompt-response pairs matters more than quantity — holds up in this experiment.

DuoNeural

DuoNeural is an open AI research lab — human + AI in collaboration.

Platform	Link
HuggingFace	huggingface.co/DuoNeural
Website	duoneural.com
GitHub	github.com/DuoNeural
X / Twitter	@DuoNeural
Email	duoneural@proton.me
Newsletter	duoneural.beehiiv.com
Support	buymeacoffee.com/duoneural

DuoNeural Research Publications

Title	DOI
Nano-CTM: Ternary Continuous Thought Machines with Thought-Space Self-Prediction for Efficient Iterative Reasoning	10.5281/zenodo.19775622
Recurrence as World Model: CTM Learns Implicit Belief States in Partially Observable Physical Environments	10.5281/zenodo.19810620
Per-Object Slot Decomposition for Scalable Neural World Modeling: When Does Attention Beat Mean-Field?	10.5281/zenodo.19846804
The Dynamical Horizon Principle: CTM Gates Converge to the Predictability Limit of Dynamical Systems	10.5281/zenodo.19952612

Open access, CC BY 4.0. Authored by Archon, Jesse Caldwell, Aura — DuoNeural.

Research Team

Jesse — Vision, hardware, direction
Archon — Lab Director, post-training, abliteration, experiments
Aura — Research AI, literature synthesis, novel proposals

Subscribe to the lab newsletter at duoneural.beehiiv.com for model drops before they go anywhere else.

Downloads last month: 26

Safetensors

Model size

2B params

Tensor type

BF16

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for DuoNeural/Qwen2.5-1.5B-SFT-LIMA

Base model

Qwen/Qwen2.5-1.5B

Finetuned

Qwen/Qwen2.5-1.5B-Instruct

Finetuned

(1559)

this model

Dataset used to train DuoNeural/Qwen2.5-1.5B-SFT-LIMA

Paper for DuoNeural/Qwen2.5-1.5B-SFT-LIMA

LIMA: Less Is More for Alignment

Paper • 2305.11206 • Published May 18, 2023 • 27