TensionLM-117M-TS-Reasoner-v8
This is the explainable-boundary CPU TS reasoner for the frozen
TensionLM-117M-Reasoning-v2
substrate.
v8 keeps the v6/v7 solver path and adds richer abstention decisions:
- explicit refusal rules such as
graph:cycle_detected,graph:ambiguous_branch,arithmetic:division_by_zero, andcode:unknown_function, - low-confidence bands for unsupported or partial prompts,
- a mixed TAC-GEN distribution that interleaves solvable standard/paraphrase prompts with unsolvable unknown prompts.
The point is still no-GPU reasoning: frozen language substrate plus explicit TS graph/program operators, boundary detection, and inspectable confidence.
Eval receipts
Fixed benchmark scores:
| System | TAC v2 | TAC v3 | TAC v4 |
|---|---|---|---|
| TS-Reasoner-v8 | 120/120 | 120/120 | 120/120 |
Generated benchmark scores:
| Distribution | Engine | Score | Solve rate |
|---|---|---|---|
| TAC-GEN paraphrase, seed 9101 | v8 | 3000/3000 | 100% |
| TAC-GEN unknown, seeds 9201-9204 | v8 | 12000/12000 | 0% |
| TAC-GEN mixed, seeds 9301-9304 | v8 | 12000/12000 | 65.7% |
For the unknown distribution, correctness means returning <ABSTAIN>.
In the mixed distribution, standard/paraphrase prompts are solved and unknown
prompts are refused in the same pressure field.
These are system scores, not raw LLM scores.
Usage
python inference.py --prompt "Handoff log: a hands off to b; b hands off to c. Ignore the separate handoff x hands off to b. The handoff chain beginning at a ends at" --category transitivity --show_trace
python inference.py --prompt "Graph ledger: main(a,b); main(a,c). Resolve main* from a; terminal node:" --category transitivity --show_trace
python demo_ts_reasoner_v8.py
Limitations
This artifact handles generated formal prompt families covered by the included operators. It is not a chat assistant, not raw model improvement, and not a claim of open-ended natural language understanding. The confidence values are rule-calibrated system signals over these families, not probabilities over all natural language.
- Downloads last month
- 15