dambun β€” a GPT from scratch, in a stack language

A character-level Transformer language model, trained and sampled by a single statically-linked 32-bit ELF that was assembled by a self-hosted compiler for a tiny stack language called kuku.

No Python. No C. No libc. No imports. No pretrained anything. Just Linux int 0x80 syscalls, x87 FPU doubles, and ~1,750 lines of source that compile down to a 32 kB ELF.

Upstream: github.com/australia/kukuos.


What's in the box

file size what it is
ngunnga.kuku 44 KB source of the kuku compiler (self-hosting)
ngunnga 24 KB the compiler, as a 32-bit Linux ELF
dambun.kuku 61 KB source of the GPT
dambun 32 KB the GPT, as a 32-bit Linux ELF
dambun.kuku-gpt-model 262 KB trained weights (4,192 params, FP64) + Adam state
names.txt 222 KB training corpus β€” 32,032 lowercase names

Architecture

Identical shape to Karpathy's minGPT / microGPT:

  • vocab_size = 27 β€” 26 lowercase letters + BOS
  • n_embd = 16
  • n_layer = 1
  • n_head = 4, head_dim = 4
  • block_size = 16
  • one block: pre-RMSnorm β†’ multi-head self-attention with KV cache β†’ residual β†’ pre-RMSnorm β†’ 64-hidden MLP (ReLU) β†’ residual β†’ lm_head
  • cross-entropy per position, averaged across positions
  • Adam (β₁=0.85, Ξ²β‚‚=0.99, Ξ΅=1e-8)
  • Total parameters: 4,192 (all FP64, allocated as 48-byte Value cells)

Built from the FPU up: a 48-byte Value struct with .data, .grad, .child1/2, and local gradients; a topologically-sorted reverse pass driven by bana-@/bana-! FPU loads/stores; a bump allocator for the per-step graph; Gaussian init via Box–Muller seeded from /dev/urandom.


Requirements

The binaries are 32-bit x86 ELFs built for Linux. On a 64-bit host you need IA32 emulation enabled (CONFIG_IA32_EMULATION=y, default on Ubuntu). On most modern distros this just works; no sudo apt install needed.

chmod +x ngunnga dambun

Training

./dambun binalku names.txt
  • Command: binalku means "remember/learn" β€” this is the training subcommand.

  • Argument: a path to a text file with one lowercase name per line (26-letter alphabet; uppercase and other characters are tokenised as BOS). We ship names.txt (32k US first names, the standard makemore/minGPT benchmark).

  • What it does:

    1. Seeds the RNG from /dev/urandom.
    2. Allocates 4,192 Value cells in the persistent params pool and Gauss-initialises them with std 0.08.
    3. Opens names.txt and reads it into memory.
    4. For N training steps (default 100), takes the next line, tokenises [BOS, c₁, …, cβ‚™, BOS], runs one forward + backward + Adam step.
    5. Every 10 steps prints warri {step} birru*1000 {loss Γ— 1000} to stdout.
    6. Saves {params, Adam m, Adam v} to /tmp/dambun.kuku-gpt-model as a flat 262 KB blob of IEEE-754 doubles.
  • Expected output on names.txt:

    bana mana, binalku jakalbaku...
    warri 0   birru*1000 3296
    warri 10  birru*1000 1879
    warri 20  birru*1000 1412
    ...
    warri 90  birru*1000 1006
    bayan balkal kunbayn.
    

    Cross-entropy falls from β‰ˆ log(27) = 3.30 to about 1.01 in 100 steps at the default lr = 1e-3. Wall clock on one x86 core: β‰ˆ 1 s.

  • To change the step count or hyper-parameters: they're compiled into the binary. Edit dambun.kuku (100 binalku-narmba near the bottom; kuku-lr near the top) and recompile with ./ngunnga --bama dambun dambun.kuku.

  • If you already have the trained weights in this folder, skip training: cp dambun.kuku-gpt-model /tmp/ and jump to inference.


Inference (sampling)

./dambun balkalaway
  • Command: balkalaway means "let us speak together" β€” inference.

  • Prerequisite: /tmp/dambun.kuku-gpt-model must exist (either from binalku above or copied from this folder).

  • What it does:

    1. Seeds the RNG from /dev/urandom.
    2. Allocates a fresh Value arena and loads weights from disk.
    3. Sets temperature to 1.0 (edit dambun.kuku to change).
    4. For 20 iterations:
      • Start with [BOS] at position 0.
      • At each of up to 16 positions, run the forward pass to get 27 output logits, divide by temperature, softmax, and sample the next character from the multinomial.
      • Stop when BOS is produced again, or after 16 characters.
      • Print the sampled name (without the BOS markers) and a newline.
  • Expected output (with the trained model that ships in this repo):

    lauosvzlu
    llllfka
    llt
    llab
    lllllalkmll
    ...
    

    A 4,192-parameter model on a character-level task with single-example SGD produces character bigrams, not English names. This is a faithful port of a tiny teaching model; the training loss curve and the sampling mechanics are what's being demonstrated.


Autograd self-test

./dambun ngana-warri

Runs gradient descent on f(x) = (x βˆ’ 3)Β² starting at x = 0 for 3,000 steps. Should print:

ngana-warri: jakalbaku x * 1000 = 0
ngana-warri: kunbayn    x * 1000 = 3000

i.e. x starts at 0 and converges to 3.0 (printed Γ— 1000 because the built-in number printer is integer-only). If this doesn't produce 3000, something is wrong with your floating-point setup before you even get to the Transformer.


Rebuilding from source

# 1) Rebuild the compiler (self-host).
./ngunnga --bama ngunnga-new ngunnga.kuku
chmod +x ngunnga-new
# ngunnga-new should byte-compile identically to ngunnga.

# 2) Rebuild the GPT.
./ngunnga-new --bama dambun-new dambun.kuku
chmod +x dambun-new

--bama means "produce a Linux ELF." Without the flag the compiler emits a bare-metal kernel ELF for the kuku-os boot target.


The kuku words, briefly

word role
balkalaway … kunbayn function definition / end
yabarrka / janay loop / break
yala / yinya if / else
kujil / wuljil / wundil dup / drop / swap (data stack)
muru / dumbarril int add / subtract
bana-* f64 (FPU) variants: bana-muru, bana-*, bana-/, bana-sqrt, bana-log, bana-exp, …
@ / ! load / store (int); bana-@ / bana-! load/store FP64
binalku training subcommand
balkalaway (as subcommand) inference subcommand
jurra softmax
jalkar RMSnorm
miyil attention
dukurr Adam optimiser
wumba RNG
bayan the model
bana floats

License

MIT.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support