Tiny Model Golf: Easy Pickings If You Know What You're Doing

Simple rules. Small models. Real constraints.

No pretending that bigger automatically means smarter.

The challenge is straightforward:

Build the best language model you can under 100M parameters, with at least a 1028-token context window.
Open source it. Document it. Submit the repo.

That’s the game.

The Judging Starts Where It Should: Output Quality

Can the model respond correctly?

Can it continue a prompt without faceplanting?

Can it actually do the thing instead of producing fluent fog?

That is the first filter, and it is the right one.

After that, the models get thrown into the benchmark grinder:

WikiText-2 CE loss
BLiMP
ARC-Easy
BPB

So this is not just vibes.

You need a model that can speak, generalize, compress language, and avoid collapsing when the evals get real.

The Easy Part

There are only a handful of entries right now.

Possibly four... dunno.

That means if you are a master, or even just a very stubborn person with a good training recipe, the RunPod credits are sitting right there. Not guaranteed, but still nice. But this is not some thousand-entry bloodbath where you need a research lab and an animal sacrifice. This is a small field with clear rules and a prize that rewards showing up with something neat.

What I Would Do

If I were entering, I wouldn't try to be cute.

I would get the basics brutally clean:

tokenizer
data mix
loss curve
parameter count script
eval harness
model card that does not read like a ransom note

Then I would start getting weird.

Tiny MoE?

No thanks, maybe a tiny generalizer.

Better tokenizer?

Sure.

Strange memory block?

Mhmm.

Some Frankenstein architecture you built at 3am because attention annoyed you personally?

Yes please.

The 100M Parameter Limit

You can't just scale your way out of bad choices. You have to decide what actually matters.

Vocabulary size
Context handling
Training data
Optimizer

The boring details suddenly become the muse. That's what I like about tiny locally trainable models.

Tiny Model Golf is not asking who has the biggest GPU pile.

It's asking who can make the smartest tradeoffs under pressure.

Also, who knows, it could be kaggle quality and you might have a working prototype for a future kaggle competition. The Prize is $50 in Runpod credits.

Dates

Round one runs June 1–30, 2026.

Submissions close June 30 at 23:59 UTC.

If you know small models, this is easy pickings.

Go wild and build something annoyingly good, or disgusting.

Crownelius
/

Tiny-Model-Golf