Sparrow-1M iter37 — Position Coupling NEGATIVE RESULT

Preserved as negative-result evidence per the 2026-05-06 Phase E Phase 1 abort.

Task: 3-digit multiplication, in-distribution
Accuracy: 2% (clean negative result; iter6a baseline was 76%)
Format: PC-modified — operands MSB-first, result LSB-first, per-sample random offset start in U[1, max_pos - len(line)]
Architecture: same as iter6a (1.3M Qwen3 dense, RoPE, GQA 4Q-2KV)
Training: 25K steps, peak_lr 1.5e-3 (lowered from earlier 3e-4 to avoid divergence)
Final loss: 0.2966 avg100 (CONVERGED TIGHTER than iter6a's ~0.62, but greedy decoding wrong on middle digits)
Likely root cause: Position Coupling routes digit-significance through position_ids which RoPE then rotates via cos/sin per position; the rotation destroys argmax precision while preserving low CE on average
Recipe gate triggered: "If eval-30 < 80% at step 25K we abort and fall back to scale-up of the standard-RoPE iter6a 3M ladder"
Author: Crownelius (github.com/Crownelius)
Code: github.com/Crownelius/crowfeather-50m-v1 (commit f7bdb72)
Source paper: Cho et al. 2024, "Position Coupling" (arxiv:2405.20671)

Includes eval_pc_mul_3d.json and eval_pc_mul_3d_v2.json showing the 2% scores. Next planned experiment is iter38a/b/c McLeish-style (Abacus + looped, arxiv:2405.17399) which avoids the position_ids/RoPE interaction entirely.

Downloads last month: 29

Safetensors

Model size

1.26M params

Tensor type

F32

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Papers for Crownelius/sparrow-1m-iter37-pc-negative

Position Coupling: Improving Length Generalization of Arithmetic Transformers Using Task Structure

Paper • 2405.20671 • Published May 31, 2024

Transformers Can Do Arithmetic with the Right Embeddings

Paper • 2405.17399 • Published May 27, 2024 • 54