BoggersTheFish commited on
Commit
f52c9f1
·
verified ·
1 Parent(s): ffeed55

Phase 2 TS-native model — 13.5M params, open-web-math, val PPL 86.50

Browse files
Files changed (4) hide show
  1. README.md +69 -0
  2. config.json +15 -0
  3. pytorch_model.pt +3 -0
  4. tokenizer.json +0 -0
README.md ADDED
@@ -0,0 +1,69 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language: en
3
+ license: mit
4
+ tags:
5
+ - language-model
6
+ - tension
7
+ - causal-lm
8
+ - novel-architecture
9
+ ---
10
+
11
+ # TensionLM
12
+
13
+ A language model trained on sigmoid *tension* instead of softmax attention.
14
+
15
+ ## Architecture
16
+
17
+ Standard transformers use softmax attention — every position competes for a
18
+ fixed budget that sums to 1. TensionLM replaces this with independent sigmoid
19
+ scores: each token pair is judged on its own merits, not in competition with
20
+ others.
21
+
22
+ ```
23
+ tau[t, w] = sigmoid( dot(q_t, k_{t-w-1}) / √d )
24
+ output[t] = Σ_w tau[t, w] * v_{t-w-1}
25
+ ```
26
+
27
+ ## Usage
28
+
29
+ ```python
30
+ import torch
31
+ from model import TensionConfig, TensionLM, generate
32
+ from tokenizers import Tokenizer
33
+
34
+ ckpt = torch.load("pytorch_model.pt", map_location="cpu", weights_only=False)
35
+ model = TensionLM(TensionConfig(**ckpt["cfg"]))
36
+ state = {k.replace("_orig_mod.", ""): v for k, v in ckpt["model"].items()}
37
+ model.load_state_dict(state)
38
+ tokenizer = Tokenizer.from_file("tokenizer.json")
39
+
40
+ enc = tokenizer.encode("The cat sat")
41
+ ids = generate(model, enc.ids, max_new=100, temp=0.8, top_p=0.92)
42
+ result = tokenizer.decode(ids)
43
+ print(result)
44
+ ```
45
+
46
+ Or use the CLI:
47
+ ```bash
48
+ python3 generate.py --checkpoint pytorch_model.pt --prompt "The cat sat"
49
+ ```
50
+
51
+ ## Training
52
+
53
+ Trained for 30518 steps on wikitext-2-raw-v1. See [github.com/BoggersTheFish/bozo](https://github.com/BoggersTheFish/bozo) for training code.
54
+
55
+ ## Model card
56
+
57
+ | Property | Value |
58
+ |----------|-------|
59
+ | Parameters | 13,573,894 |
60
+ | Architecture | TensionLM (sigmoid tension, windowed) |
61
+ | Dataset | wikitext-2-raw-v1 |
62
+ | Val PPL | 86.50 |
63
+ | Context window | 32 tokens per layer × 6 layers |
64
+
65
+ ## Limitations
66
+
67
+ This is a research model. It does not follow instructions, has not been
68
+ fine-tuned, and may produce incoherent or incorrect text. It is intended
69
+ to demonstrate the tension mechanism, not as a production system.
config.json ADDED
@@ -0,0 +1,15 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "arch": "tension",
3
+ "vocab_size": 32768,
4
+ "dim": 256,
5
+ "num_layers": 6,
6
+ "num_heads": 4,
7
+ "window": 32,
8
+ "ffn_mult": 3,
9
+ "max_seq_len": 256,
10
+ "dropout": 0.1,
11
+ "use_grad_checkpoint": false,
12
+ "use_oscillation": true,
13
+ "use_rope": false,
14
+ "use_triton": false
15
+ }
pytorch_model.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:47cf3802b0266dbe637784630c482b6d8a10a864019a6c9df621fe6291ef8704
3
+ size 162978067
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff