dambun β a GPT from scratch, in a stack language
A character-level Transformer language model, trained and sampled by a single statically-linked 32-bit ELF that was assembled by a self-hosted compiler for a tiny stack language called kuku.
No Python. No C. No libc. No imports. No pretrained anything. Just
Linux int 0x80 syscalls, x87 FPU doubles, and ~1,750 lines of source
that compile down to a 32 kB ELF.
Upstream: github.com/australia/kukuos.
What's in the box
| file | size | what it is |
|---|---|---|
ngunnga.kuku |
44 KB | source of the kuku compiler (self-hosting) |
ngunnga |
24 KB | the compiler, as a 32-bit Linux ELF |
dambun.kuku |
61 KB | source of the GPT |
dambun |
32 KB | the GPT, as a 32-bit Linux ELF |
dambun.kuku-gpt-model |
262 KB | trained weights (4,192 params, FP64) + Adam state |
names.txt |
222 KB | training corpus β 32,032 lowercase names |
Architecture
Identical shape to Karpathy's minGPT / microGPT:
vocab_size = 27β 26 lowercase letters + BOSn_embd = 16n_layer = 1n_head = 4,head_dim = 4block_size = 16- one block: pre-RMSnorm β multi-head self-attention with KV cache β residual β pre-RMSnorm β 64-hidden MLP (ReLU) β residual β
lm_head - cross-entropy per position, averaged across positions
- Adam (Ξ²β=0.85, Ξ²β=0.99, Ξ΅=1e-8)
- Total parameters: 4,192 (all FP64, allocated as 48-byte
Valuecells)
Built from the FPU up: a 48-byte Value struct with .data, .grad,
.child1/2, and local gradients; a topologically-sorted reverse pass
driven by bana-@/bana-! FPU loads/stores; a bump allocator for the
per-step graph; Gaussian init via BoxβMuller seeded from /dev/urandom.
Requirements
The binaries are 32-bit x86 ELFs built for Linux. On a 64-bit host you
need IA32 emulation enabled (CONFIG_IA32_EMULATION=y, default on
Ubuntu). On most modern distros this just works; no sudo apt install
needed.
chmod +x ngunnga dambun
Training
./dambun binalku names.txt
Command:
binalkumeans "remember/learn" β this is the training subcommand.Argument: a path to a text file with one lowercase name per line (26-letter alphabet; uppercase and other characters are tokenised as BOS). We ship
names.txt(32k US first names, the standard makemore/minGPT benchmark).What it does:
- Seeds the RNG from
/dev/urandom. - Allocates 4,192
Valuecells in the persistent params pool and Gauss-initialises them with std 0.08. - Opens
names.txtand reads it into memory. - For
Ntraining steps (default 100), takes the next line, tokenises[BOS, cβ, β¦, cβ, BOS], runs one forward + backward + Adam step. - Every 10 steps prints
warri {step} birru*1000 {loss Γ 1000}to stdout. - Saves
{params, Adam m, Adam v}to/tmp/dambun.kuku-gpt-modelas a flat 262 KB blob of IEEE-754 doubles.
- Seeds the RNG from
Expected output on
names.txt:bana mana, binalku jakalbaku... warri 0 birru*1000 3296 warri 10 birru*1000 1879 warri 20 birru*1000 1412 ... warri 90 birru*1000 1006 bayan balkal kunbayn.Cross-entropy falls from β log(27) = 3.30 to about 1.01 in 100 steps at the default lr = 1e-3. Wall clock on one x86 core: β 1 s.
To change the step count or hyper-parameters: they're compiled into the binary. Edit
dambun.kuku(100 binalku-narmbanear the bottom;kuku-lrnear the top) and recompile with./ngunnga --bama dambun dambun.kuku.If you already have the trained weights in this folder, skip training:
cp dambun.kuku-gpt-model /tmp/and jump to inference.
Inference (sampling)
./dambun balkalaway
Command:
balkalawaymeans "let us speak together" β inference.Prerequisite:
/tmp/dambun.kuku-gpt-modelmust exist (either frombinalkuabove or copied from this folder).What it does:
- Seeds the RNG from
/dev/urandom. - Allocates a fresh
Valuearena and loads weights from disk. - Sets temperature to 1.0 (edit
dambun.kukuto change). - For 20 iterations:
- Start with
[BOS]at position 0. - At each of up to 16 positions, run the forward pass to get 27 output logits, divide by temperature, softmax, and sample the next character from the multinomial.
- Stop when
BOSis produced again, or after 16 characters. - Print the sampled name (without the
BOSmarkers) and a newline.
- Start with
- Seeds the RNG from
Expected output (with the trained model that ships in this repo):
lauosvzlu llllfka llt llab lllllalkmll ...A 4,192-parameter model on a character-level task with single-example SGD produces character bigrams, not English names. This is a faithful port of a tiny teaching model; the training loss curve and the sampling mechanics are what's being demonstrated.
Autograd self-test
./dambun ngana-warri
Runs gradient descent on f(x) = (x β 3)Β² starting at x = 0 for
3,000 steps. Should print:
ngana-warri: jakalbaku x * 1000 = 0
ngana-warri: kunbayn x * 1000 = 3000
i.e. x starts at 0 and converges to 3.0 (printed Γ 1000 because the
built-in number printer is integer-only). If this doesn't produce
3000, something is wrong with your floating-point setup before you
even get to the Transformer.
Rebuilding from source
# 1) Rebuild the compiler (self-host).
./ngunnga --bama ngunnga-new ngunnga.kuku
chmod +x ngunnga-new
# ngunnga-new should byte-compile identically to ngunnga.
# 2) Rebuild the GPT.
./ngunnga-new --bama dambun-new dambun.kuku
chmod +x dambun-new
--bama means "produce a Linux ELF." Without the flag the compiler
emits a bare-metal kernel ELF for the kuku-os boot target.
The kuku words, briefly
| word | role |
|---|---|
balkalaway β¦ kunbayn |
function definition / end |
yabarrka / janay |
loop / break |
yala / yinya |
if / else |
kujil / wuljil / wundil |
dup / drop / swap (data stack) |
muru / dumbarril |
int add / subtract |
bana-* |
f64 (FPU) variants: bana-muru, bana-*, bana-/, bana-sqrt, bana-log, bana-exp, β¦ |
@ / ! |
load / store (int); bana-@ / bana-! load/store FP64 |
binalku |
training subcommand |
balkalaway (as subcommand) |
inference subcommand |
jurra |
softmax |
jalkar |
RMSnorm |
miyil |
attention |
dukurr |
Adam optimiser |
wumba |
RNG |
bayan |
the model |
bana |
floats |
License
MIT.