[I know it's a bad model and the architecture is an overkill for the use case, but I retain this repo for nostalgia, it was my first LLM experiment]

A GPT-2 Small AI model trained on images from Google's "Quick, Draw!" game, which were converted to 24x24 texts.

The images were in the training material in such format:

<MSGSTART>
apple
<IMGSTART>
111111111111111111111111
111111111111111111111111
111111111111111111111111
111111111111111111111111
111111111100111111111111
111111111100111111111111
111111111100111111111111
111110000000011111111111
111100000000000111111111
111000111111100000011111
110001111111111111000111
110011111111111111110001
110011111111111111111001
110011111111111111111001
111011111111111111110001
111001111111111111100011
111000111111111111100111
111110001111111111000111
111111000111111110001111
111111111000000000011111
111111111110000000111111
111111111111111111111111
111111111111111111111111
111111111111111111111111
<IMGEND>
apple
<MSGEND>

This model is capable of generating such images and recognising them.

When the input is name of the object followed by <IMGSTART>, the output will be the image.

When the input is a partially drawn image, the output will be the rest of it.

When the input is an image followed by <IMGEND> tag, next generated token will be name of object in the image.

During training newlines got replaced with spaces, so when you send input to the model replace newlines with spaces and for its output you can do the opposite (just Python .replace("\n"," ") and then .replace(" ","\n") is enough for example)

<MSGSTART>, <IMGSTART>, <IMGEND> and <MSGEND> were made special tokens when training the tokenizer.

Recommended model run parameters:

max_length = 250

temperature = 0.9 [max]

top_k = 50

top_p = 0.95

do_sample = True

Skip special tokens = False