TensionLM-117M-Reasoning-v2

This is a research TensionLM checkpoint packaged in safetensors format. It is the locally validated reasoning-v2 release from the bozo workspace: the 117M curriculum TensionLM substrate was kept intact except for localized upper-block relaxation on answer-prefix formal/code data.

TensionLM uses sigmoid tension instead of softmax attention. Token-pair constraints are scored independently, so multiple past tokens can remain active at full strength instead of competing for a single softmax budget.

What changed

  • Base: checkpoints/117m-curriculum/pytorch_model.pt
  • Released checkpoint: checkpoints/formal-repair-v2-prefix-only-seed42/latest.pt
  • Repair scope: upper blocks 8-11
  • Repair contract: answer-prefix completions, avoiding Question:/Answer: wrapper drift
  • Benchmark: held-out TAC v2, 120 prompts, 40/40/40 arithmetic/code/transitivity

Local held-out TAC v2 eval

Raw generation, seed 42, max_new=12, temp=0.3, top_p=0.9.

Model Prefix Substring Arithmetic prefix Code prefix Transitivity prefix
GPT-2 124M 3/120 (2.5%) 5/120 1/40 2/40 0/40
Base TensionLM 117M 7/120 (5.8%) 11/120 0/40 1/40 6/40
Reasoning-v2 repair 20/120 (16.7%) 21/120 1/40 6/40 13/40
Category-shuffled control 6/120 (5.0%) 6/120 0/40 4/40 2/40
Global-shuffled control 5/120 (4.2%) 5/120 1/40 1/40 3/40

The repaired model beats GPT-2, the base 117M checkpoint, and both matched shuffled controls on prefix score for this local held-out benchmark. The gain is strongest in transitivity and code; arithmetic remains weak.

Usage

pip install torch tokenizers safetensors huggingface_hub
python inference.py --repo_id BoggersTheFish/TensionLM-117M-Reasoning-v2 --prompt "If A implies B and B implies C then A implies"

Or after cloning/downloading the repo:

python inference.py --model_dir . --prompt "In Python, list(range(4)) ends with"

Files

  • model-*.safetensors - sharded weights
  • config.json - TensionLM config and release metadata
  • tokenizer.json - tokenizer used by the checkpoint
  • model.py - model definition
  • inference.py - minimal generation script
  • eval/release_summary.json - exact local release summary
  • eval/*_seed42.json - formal eval receipts used for the table

Limitations

This is not an instruction-tuned assistant. It is a small research model and can produce wrong, repetitive, or incoherent continuations. The evaluation above is local and narrow; it should not be read as broad GPT-2 superiority or broad softmax-attention superiority. The next intended release path is a full Path A run with GPT-2 tokenizer, W=256, ProofPile/formal stage, math+code stage, and logic_mix=0.10 once GPU compute is available.

Downloads last month
20
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support