TensionLM-117M-Reasoning-v2
This is a research TensionLM checkpoint packaged in safetensors format.
It is the locally validated reasoning-v2 release from the bozo workspace:
the 117M curriculum TensionLM substrate was kept intact except for localized
upper-block relaxation on answer-prefix formal/code data.
TensionLM uses sigmoid tension instead of softmax attention. Token-pair constraints are scored independently, so multiple past tokens can remain active at full strength instead of competing for a single softmax budget.
What changed
- Base:
checkpoints/117m-curriculum/pytorch_model.pt - Released checkpoint:
checkpoints/formal-repair-v2-prefix-only-seed42/latest.pt - Repair scope: upper blocks
8-11 - Repair contract: answer-prefix completions, avoiding
Question:/Answer:wrapper drift - Benchmark: held-out TAC v2, 120 prompts,
40/40/40arithmetic/code/transitivity
Local held-out TAC v2 eval
Raw generation, seed 42, max_new=12, temp=0.3, top_p=0.9.
| Model | Prefix | Substring | Arithmetic prefix | Code prefix | Transitivity prefix |
|---|---|---|---|---|---|
| GPT-2 124M | 3/120 (2.5%) | 5/120 | 1/40 | 2/40 | 0/40 |
| Base TensionLM 117M | 7/120 (5.8%) | 11/120 | 0/40 | 1/40 | 6/40 |
| Reasoning-v2 repair | 20/120 (16.7%) | 21/120 | 1/40 | 6/40 | 13/40 |
| Category-shuffled control | 6/120 (5.0%) | 6/120 | 0/40 | 4/40 | 2/40 |
| Global-shuffled control | 5/120 (4.2%) | 5/120 | 1/40 | 1/40 | 3/40 |
The repaired model beats GPT-2, the base 117M checkpoint, and both matched shuffled controls on prefix score for this local held-out benchmark. The gain is strongest in transitivity and code; arithmetic remains weak.
Usage
pip install torch tokenizers safetensors huggingface_hub
python inference.py --repo_id BoggersTheFish/TensionLM-117M-Reasoning-v2 --prompt "If A implies B and B implies C then A implies"
Or after cloning/downloading the repo:
python inference.py --model_dir . --prompt "In Python, list(range(4)) ends with"
Files
model-*.safetensors- sharded weightsconfig.json- TensionLM config and release metadatatokenizer.json- tokenizer used by the checkpointmodel.py- model definitioninference.py- minimal generation scripteval/release_summary.json- exact local release summaryeval/*_seed42.json- formal eval receipts used for the table
Limitations
This is not an instruction-tuned assistant. It is a small research model and can
produce wrong, repetitive, or incoherent continuations. The evaluation above is
local and narrow; it should not be read as broad GPT-2 superiority or broad
softmax-attention superiority. The next intended release path is a full Path A
run with GPT-2 tokenizer, W=256, ProofPile/formal stage, math+code stage, and
logic_mix=0.10 once GPU compute is available.
- Downloads last month
- 20