Phase 2 TS-native model — 13.5M params, open-web-math, val PPL 86.50

Browse files

Files changed (4) hide show

README.md +69 -0
config.json +15 -0
pytorch_model.pt +3 -0
tokenizer.json +0 -0

README.md ADDED Viewed

	@@ -0,0 +1,69 @@

+---
+language: en
+license: mit
+tags:
+  - language-model
+  - tension
+  - causal-lm
+  - novel-architecture
+---
+# TensionLM
+A language model trained on sigmoid *tension* instead of softmax attention.
+## Architecture
+Standard transformers use softmax attention — every position competes for a
+fixed budget that sums to 1. TensionLM replaces this with independent sigmoid
+scores: each token pair is judged on its own merits, not in competition with
+others.
+```
+tau[t, w] = sigmoid( dot(q_t, k_{t-w-1}) / √d )
+output[t] = Σ_w  tau[t, w] * v_{t-w-1}
+```
+## Usage
+```python
+import torch
+from model import TensionConfig, TensionLM, generate
+from tokenizers import Tokenizer
+ckpt      = torch.load("pytorch_model.pt", map_location="cpu", weights_only=False)
+model     = TensionLM(TensionConfig(**ckpt["cfg"]))
+state     = {k.replace("_orig_mod.", ""): v for k, v in ckpt["model"].items()}
+model.load_state_dict(state)
+tokenizer = Tokenizer.from_file("tokenizer.json")
+enc    = tokenizer.encode("The cat sat")
+ids    = generate(model, enc.ids, max_new=100, temp=0.8, top_p=0.92)
+result = tokenizer.decode(ids)
+print(result)
+```
+Or use the CLI:
+```bash
+python3 generate.py --checkpoint pytorch_model.pt --prompt "The cat sat"
+```
+## Training
+Trained for 30518 steps on wikitext-2-raw-v1. See [github.com/BoggersTheFish/bozo](https://github.com/BoggersTheFish/bozo) for training code.
+## Model card
+| Property | Value |
+|----------|-------|
+| Parameters | 13,573,894 |
+| Architecture | TensionLM (sigmoid tension, windowed) |
+| Dataset | wikitext-2-raw-v1 |
+| Val PPL | 86.50 |
+| Context window | 32 tokens per layer × 6 layers |
+## Limitations
+This is a research model. It does not follow instructions, has not been
+fine-tuned, and may produce incoherent or incorrect text. It is intended
+to demonstrate the tension mechanism, not as a production system.

config.json ADDED Viewed

	@@ -0,0 +1,15 @@

+{
+  "arch": "tension",
+  "vocab_size": 32768,
+  "dim": 256,
+  "num_layers": 6,
+  "num_heads": 4,
+  "window": 32,
+  "ffn_mult": 3,
+  "max_seq_len": 256,
+  "dropout": 0.1,
+  "use_grad_checkpoint": false,
+  "use_oscillation": true,
+  "use_rope": false,
+  "use_triton": false
+}

pytorch_model.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:47cf3802b0266dbe637784630c482b6d8a10a864019a6c9df621fe6291ef8704
+size 162978067

tokenizer.json ADDED Viewed

The diff for this file is too large to render. See raw diff