TensionLM-117M-Reasoning-v2

This is a research TensionLM checkpoint packaged in safetensors format. It is the locally validated reasoning-v2 release from the bozo workspace: the 117M curriculum TensionLM substrate was kept intact except for localized upper-block relaxation on answer-prefix formal/code data.

TensionLM uses sigmoid tension instead of softmax attention. Token-pair constraints are scored independently, so multiple past tokens can remain active at full strength instead of competing for a single softmax budget.

What changed

Base: checkpoints/117m-curriculum/pytorch_model.pt
Released checkpoint: checkpoints/formal-repair-v2-prefix-only-seed42/latest.pt
Repair scope: upper blocks 8-11
Repair contract: answer-prefix completions, avoiding Question:/Answer: wrapper drift
Benchmark: held-out TAC v2, 120 prompts, 40/40/40 arithmetic/code/transitivity

Local held-out TAC v2 eval

Raw generation, seed 42, max_new=12, temp=0.3, top_p=0.9.

Model	Prefix	Substring	Arithmetic prefix	Code prefix	Transitivity prefix
GPT-2 124M	3/120 (2.5%)	5/120	1/40	2/40	0/40
Base TensionLM 117M	7/120 (5.8%)	11/120	0/40	1/40	6/40
Reasoning-v2 repair	20/120 (16.7%)	21/120	1/40	6/40	13/40
Category-shuffled control	6/120 (5.0%)	6/120	0/40	4/40	2/40
Global-shuffled control	5/120 (4.2%)	5/120	1/40	1/40	3/40

The repaired model beats GPT-2, the base 117M checkpoint, and both matched shuffled controls on prefix score for this local held-out benchmark. The gain is strongest in transitivity and code; arithmetic remains weak.

Usage

pip install torch tokenizers safetensors huggingface_hub
python inference.py --repo_id BoggersTheFish/TensionLM-117M-Reasoning-v2 --prompt "If A implies B and B implies C then A implies"

Or after cloning/downloading the repo:

python inference.py --model_dir . --prompt "In Python, list(range(4)) ends with"

Files

model-*.safetensors - sharded weights
config.json - TensionLM config and release metadata
tokenizer.json - tokenizer used by the checkpoint
model.py - model definition
inference.py - minimal generation script
eval/release_summary.json - exact local release summary
eval/*_seed42.json - formal eval receipts used for the table

Limitations

This is not an instruction-tuned assistant. It is a small research model and can produce wrong, repetitive, or incoherent continuations. The evaluation above is local and narrow; it should not be read as broad GPT-2 superiority or broad softmax-attention superiority. The next intended release path is a full Path A run with GPT-2 tokenizer, W=256, ProofPile/formal stage, math+code stage, and logic_mix=0.10 once GPU compute is available.

Downloads last month: 20

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support