Phase 2 TS-native model — 13.5M params, open-web-math, val PPL 86.50
Browse files- README.md +69 -0
- config.json +15 -0
- pytorch_model.pt +3 -0
- tokenizer.json +0 -0
README.md
ADDED
|
@@ -0,0 +1,69 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
language: en
|
| 3 |
+
license: mit
|
| 4 |
+
tags:
|
| 5 |
+
- language-model
|
| 6 |
+
- tension
|
| 7 |
+
- causal-lm
|
| 8 |
+
- novel-architecture
|
| 9 |
+
---
|
| 10 |
+
|
| 11 |
+
# TensionLM
|
| 12 |
+
|
| 13 |
+
A language model trained on sigmoid *tension* instead of softmax attention.
|
| 14 |
+
|
| 15 |
+
## Architecture
|
| 16 |
+
|
| 17 |
+
Standard transformers use softmax attention — every position competes for a
|
| 18 |
+
fixed budget that sums to 1. TensionLM replaces this with independent sigmoid
|
| 19 |
+
scores: each token pair is judged on its own merits, not in competition with
|
| 20 |
+
others.
|
| 21 |
+
|
| 22 |
+
```
|
| 23 |
+
tau[t, w] = sigmoid( dot(q_t, k_{t-w-1}) / √d )
|
| 24 |
+
output[t] = Σ_w tau[t, w] * v_{t-w-1}
|
| 25 |
+
```
|
| 26 |
+
|
| 27 |
+
## Usage
|
| 28 |
+
|
| 29 |
+
```python
|
| 30 |
+
import torch
|
| 31 |
+
from model import TensionConfig, TensionLM, generate
|
| 32 |
+
from tokenizers import Tokenizer
|
| 33 |
+
|
| 34 |
+
ckpt = torch.load("pytorch_model.pt", map_location="cpu", weights_only=False)
|
| 35 |
+
model = TensionLM(TensionConfig(**ckpt["cfg"]))
|
| 36 |
+
state = {k.replace("_orig_mod.", ""): v for k, v in ckpt["model"].items()}
|
| 37 |
+
model.load_state_dict(state)
|
| 38 |
+
tokenizer = Tokenizer.from_file("tokenizer.json")
|
| 39 |
+
|
| 40 |
+
enc = tokenizer.encode("The cat sat")
|
| 41 |
+
ids = generate(model, enc.ids, max_new=100, temp=0.8, top_p=0.92)
|
| 42 |
+
result = tokenizer.decode(ids)
|
| 43 |
+
print(result)
|
| 44 |
+
```
|
| 45 |
+
|
| 46 |
+
Or use the CLI:
|
| 47 |
+
```bash
|
| 48 |
+
python3 generate.py --checkpoint pytorch_model.pt --prompt "The cat sat"
|
| 49 |
+
```
|
| 50 |
+
|
| 51 |
+
## Training
|
| 52 |
+
|
| 53 |
+
Trained for 30518 steps on wikitext-2-raw-v1. See [github.com/BoggersTheFish/bozo](https://github.com/BoggersTheFish/bozo) for training code.
|
| 54 |
+
|
| 55 |
+
## Model card
|
| 56 |
+
|
| 57 |
+
| Property | Value |
|
| 58 |
+
|----------|-------|
|
| 59 |
+
| Parameters | 13,573,894 |
|
| 60 |
+
| Architecture | TensionLM (sigmoid tension, windowed) |
|
| 61 |
+
| Dataset | wikitext-2-raw-v1 |
|
| 62 |
+
| Val PPL | 86.50 |
|
| 63 |
+
| Context window | 32 tokens per layer × 6 layers |
|
| 64 |
+
|
| 65 |
+
## Limitations
|
| 66 |
+
|
| 67 |
+
This is a research model. It does not follow instructions, has not been
|
| 68 |
+
fine-tuned, and may produce incoherent or incorrect text. It is intended
|
| 69 |
+
to demonstrate the tension mechanism, not as a production system.
|
config.json
ADDED
|
@@ -0,0 +1,15 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{
|
| 2 |
+
"arch": "tension",
|
| 3 |
+
"vocab_size": 32768,
|
| 4 |
+
"dim": 256,
|
| 5 |
+
"num_layers": 6,
|
| 6 |
+
"num_heads": 4,
|
| 7 |
+
"window": 32,
|
| 8 |
+
"ffn_mult": 3,
|
| 9 |
+
"max_seq_len": 256,
|
| 10 |
+
"dropout": 0.1,
|
| 11 |
+
"use_grad_checkpoint": false,
|
| 12 |
+
"use_oscillation": true,
|
| 13 |
+
"use_rope": false,
|
| 14 |
+
"use_triton": false
|
| 15 |
+
}
|
pytorch_model.pt
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:47cf3802b0266dbe637784630c482b6d8a10a864019a6c9df621fe6291ef8704
|
| 3 |
+
size 162978067
|
tokenizer.json
ADDED
|
The diff for this file is too large to render.
See raw diff
|
|
|