dambun — a GPT from scratch, in a stack language

A character-level Transformer language model, trained and sampled by a single statically-linked 32-bit ELF that was assembled by a self-hosted compiler for a tiny stack language called kuku.

No Python. No C. No libc. No imports. No pretrained anything. Just Linux int 0x80 syscalls, x87 FPU doubles, and ~1,750 lines of source that compile down to a 32 kB ELF.

Upstream: github.com/australia/kukuos.

What's in the box

file	size	what it is
`ngunnga.kuku`	44 KB	source of the kuku compiler (self-hosting)
`ngunnga`	24 KB	the compiler, as a 32-bit Linux ELF
`dambun.kuku`	61 KB	source of the GPT
`dambun`	32 KB	the GPT, as a 32-bit Linux ELF
`dambun.kuku-gpt-model`	262 KB	trained weights (4,192 params, FP64) + Adam state
`names.txt`	222 KB	training corpus — 32,032 lowercase names

Architecture

Identical shape to Karpathy's minGPT / microGPT:

vocab_size = 27 — 26 lowercase letters + BOS
n_embd = 16
n_layer = 1
n_head = 4, head_dim = 4
block_size = 16
one block: pre-RMSnorm → multi-head self-attention with KV cache → residual → pre-RMSnorm → 64-hidden MLP (ReLU) → residual → lm_head
cross-entropy per position, averaged across positions
Adam (β₁=0.85, β₂=0.99, ε=1e-8)
Total parameters: 4,192 (all FP64, allocated as 48-byte Value cells)

Built from the FPU up: a 48-byte Value struct with .data, .grad, .child1/2, and local gradients; a topologically-sorted reverse pass driven by bana-@/bana-! FPU loads/stores; a bump allocator for the per-step graph; Gaussian init via Box–Muller seeded from /dev/urandom.

Requirements

The binaries are 32-bit x86 ELFs built for Linux. On a 64-bit host you need IA32 emulation enabled (CONFIG_IA32_EMULATION=y, default on Ubuntu). On most modern distros this just works; no sudo apt install needed.

chmod +x ngunnga dambun

Training

./dambun binalku names.txt

Command: binalku means "remember/learn" — this is the training subcommand.
Argument: a path to a text file with one lowercase name per line (26-letter alphabet; uppercase and other characters are tokenised as BOS). We ship names.txt (32k US first names, the standard makemore/minGPT benchmark).
What it does:
1. Seeds the RNG from /dev/urandom.
2. Allocates 4,192 Value cells in the persistent params pool and Gauss-initialises them with std 0.08.
3. Opens names.txt and reads it into memory.
4. For N training steps (default 100), takes the next line, tokenises [BOS, c₁, …, cₙ, BOS], runs one forward + backward + Adam step.
5. Every 10 steps prints warri {step} birru*1000 {loss × 1000} to stdout.
6. Saves {params, Adam m, Adam v} to /tmp/dambun.kuku-gpt-model as a flat 262 KB blob of IEEE-754 doubles.

Expected output on names.txt:

bana mana, binalku jakalbaku...
warri 0   birru*1000 3296
warri 10  birru*1000 1879
warri 20  birru*1000 1412
...
warri 90  birru*1000 1006
bayan balkal kunbayn.

Cross-entropy falls from ≈ log(27) = 3.30 to about 1.01 in 100 steps at the default lr = 1e-3. Wall clock on one x86 core: ≈ 1 s.

To change the step count or hyper-parameters: they're compiled into the binary. Edit dambun.kuku (100 binalku-narmba near the bottom; kuku-lr near the top) and recompile with ./ngunnga --bama dambun dambun.kuku.
If you already have the trained weights in this folder, skip training: cp dambun.kuku-gpt-model /tmp/ and jump to inference.

Inference (sampling)

./dambun balkalaway

Command: balkalaway means "let us speak together" — inference.
Prerequisite: /tmp/dambun.kuku-gpt-model must exist (either from binalku above or copied from this folder).
What it does:
1. Seeds the RNG from /dev/urandom.
2. Allocates a fresh Value arena and loads weights from disk.
3. Sets temperature to 1.0 (edit dambun.kuku to change).
4. For 20 iterations:
  - Start with [BOS] at position 0.
  - At each of up to 16 positions, run the forward pass to get 27 output logits, divide by temperature, softmax, and sample the next character from the multinomial.
  - Stop when BOS is produced again, or after 16 characters.
  - Print the sampled name (without the BOS markers) and a newline.
Expected output (with the trained model that ships in this repo):
```
lauosvzlu
llllfka
llt
llab
lllllalkmll
...
```
A 4,192-parameter model on a character-level task with single-example SGD produces character bigrams, not English names. This is a faithful port of a tiny teaching model; the training loss curve and the sampling mechanics are what's being demonstrated.

Autograd self-test

./dambun ngana-warri

Runs gradient descent on f(x) = (x − 3)² starting at x = 0 for 3,000 steps. Should print:

ngana-warri: jakalbaku x * 1000 = 0
ngana-warri: kunbayn    x * 1000 = 3000

i.e. x starts at 0 and converges to 3.0 (printed × 1000 because the built-in number printer is integer-only). If this doesn't produce 3000, something is wrong with your floating-point setup before you even get to the Transformer.

Rebuilding from source

# 1) Rebuild the compiler (self-host).
./ngunnga --bama ngunnga-new ngunnga.kuku
chmod +x ngunnga-new
# ngunnga-new should byte-compile identically to ngunnga.

# 2) Rebuild the GPT.
./ngunnga-new --bama dambun-new dambun.kuku
chmod +x dambun-new

--bama means "produce a Linux ELF." Without the flag the compiler emits a bare-metal kernel ELF for the kuku-os boot target.

The kuku words, briefly

word	role
`balkalaway … kunbayn`	function definition / end
`yabarrka` / `janay`	loop / break
`yala` / `yinya`	if / else
`kujil` / `wuljil` / `wundil`	dup / drop / swap (data stack)
`muru` / `dumbarril`	int add / subtract
`bana-*`	f64 (FPU) variants: `bana-muru`, `bana-*`, `bana-/`, `bana-sqrt`, `bana-log`, `bana-exp`, …
`@` / `!`	load / store (int); `bana-@` / `bana-!` load/store FP64
`binalku`	training subcommand
`balkalaway` (as subcommand)	inference subcommand
`jurra`	softmax
`jalkar`	RMSnorm
`miyil`	attention
`dukurr`	Adam optimiser
`wumba`	RNG
`bayan`	the model
`bana`	floats

License

MIT.

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support