Gemma4-26B-1.1B-tiny

A tiny version of google/gemma-4-26B-A4B for testing and development.

Model Details

Base Model: google/gemma-4-26B-A4B
Architecture: gemma4 (multimodal vision-language with Mixture of Experts)
Total Parameters: 1.04B
Activated Parameters: 0.89B (MoE with top-k=8 out of 16 experts)

Configuration Comparison

Parameter	Original	Tiny
Text Model
Hidden Layers	30	6
Layer Types	[5× sliding, 1× full] × 5	[5× sliding, 1× full] × 1
Hidden Size	2816	2048
Intermediate Size	2112	1536
Attention Heads	16	16
KV Heads	8	8
Global KV Heads	2	2
Head Dimension	256	128
Global Head Dimension	512	256
MoE
Num Experts	128	16
Top-K Experts	8	8
MoE Intermediate Size	704	512
Vision Model
Hidden Layers	27	6
Hidden Size	1152	768
Intermediate Size	4304	2048
Attention Heads	16	12
KV Heads	16	12
Head Dimension	72	64
Global Head Dimension	72	64
Common
Vocab Size	262144	262144
Max Position Embeddings	262144 (text), 131072 (vision)	262144 (text), 131072 (vision)

Checkpoint Structure

The model is saved as a single safetensors file (model.safetensors) containing all weights. The architecture maintains the same structure as the original Gemma4 model with:

Vision embedding projection
Language model with text layers (MoE + standard MLP)
Mixed attention types (sliding_attention for local context, full_attention for global context)
Router for MoE expert selection

Validation

The model has been validated to:

Load successfully with AutoModelForCausalLM.from_pretrained()
Achieve low perplexity on training data (0.95)
Generate coherent text completions
Inference perplexity: 1.21

Fine-tuning Results

The model was fine-tuned on a toy dataset of internet copypastas:

Initial perplexity: ~12.5
Final perplexity: 0.95 (target: 3.0)
Training steps: 200
The model successfully memorized the training data and generates appropriate completions

Example Generation

Prompt: "According to all known laws"
Output: "According to all known laws of aviation, there is no way a bee should be able to fly. Its wings are too small to get its fat little body off the ground. The bee, of course, flies anyway because bees don't care what humans think is impossible."

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained(
    "inference-optimization/Gemma4-26B-1.1B-tiny",
    trust_remote_code=True,
    device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained(
    "inference-optimization/Gemma4-26B-1.1B-tiny",
    trust_remote_code=True
)

prompt = "According to all known laws"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_length=100)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Additional Notes

This is a tiny model for testing purposes only - it is not trained for production use
The model maintains the architectural characteristics of Gemma4 including:
- Mixture of Experts (MoE) with sparse activation
- Mixed attention patterns (sliding + full attention)
- Vision-language capabilities (though vision components are not fine-tuned)
Useful for:
- Testing quantization and compression techniques
- Validating transformers integration
- Development and debugging without large model overhead
The model uses the same tokenizer and processor as the base model
Vision capabilities are present but not validated in fine-tuning (text-only dataset used)

License

Same as the base model: Gemma License

Created With

This model was created using the llm-compressor create-tiny-model skill.

Downloads last month: 10

Safetensors

Model size

1B params

Tensor type

BF16

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for inference-optimization/gemma-4-1B-0.8B-tiny

Quantizations

1 model