TensionLM-117M-TS-Reasoner-v3

This is the CPU-only TS reasoning layer built around the frozen TensionLM-117M-Reasoning-v2 release.

It is not a new dense checkpoint. It is a small auditable reasoning shell:

parse the prompt into a tiny constraint graph,
solve/verify arithmetic, transitivity, or small Python facts on CPU,
return the first answer directly,
fall back to the frozen TensionLM model only when no rule matches.

Held-out TAC v2 result

Benchmark: heldout_formal_tac_v2, 120 prompts, balanced as 40/40/40 arithmetic/code/transitivity.

System	Prefix	Substring
GPT-2 124M	3/120	5/120
Base TensionLM 117M	7/120	11/120
TensionLM-117M-Reasoning-v2	20/120	21/120
TS-Reasoner-v3 CPU layer	120/120	120/120

This score is a system score, not a raw language-model score. The major jump comes from moving reasoning into explicit TS graph/verifier operations instead of trying to retrain all 117M parameters without GPU.

Usage

python inference.py --prompt "If red leads to blue and blue leads to green, then red leads to" --category transitivity
python inference.py --prompt "46 plus 9 equals" --category arithmetic
python inference.py --prompt "In Python, list(range(8)) ends with" --category code_reasoning

Files

cpu_reasoner.py - auditable CPU solver/verifier layer.
inference.py - minimal CLI.
eval_cpu_reasoner.py - formal-eval-compatible evaluator.
eval/ts_reasoner_v3_heldout_tac_v2_seed42.json - full eval receipt.
config.json - release metadata and fallback model reference.

Limitations

This artifact is intentionally narrow. It handles short formal prompts that match the implemented arithmetic, transitivity, and Python-fact rules. It is not a chat assistant and it is not evidence that the raw frozen model alone solved all 120 prompts. The point is the no-GPU path: freeze the language substrate, then move capability through inspectable graph/verifier machinery.

Downloads last month: 27

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support