YAML Metadata Warning:empty or missing yaml metadata in repo card

Check out the documentation for more information.

Gemma4-26B-1.1B-tiny

A tiny version of google/gemma-4-26B-A4B for testing and development.

Model Details

  • Base Model: google/gemma-4-26B-A4B
  • Architecture: gemma4 (multimodal vision-language with Mixture of Experts)
  • Total Parameters: 1.04B
  • Activated Parameters: 0.89B (MoE with top-k=8 out of 16 experts)

Configuration Comparison

Parameter Original Tiny
Text Model
Hidden Layers 30 6
Layer Types [5× sliding, 1× full] × 5 [5× sliding, 1× full] × 1
Hidden Size 2816 2048
Intermediate Size 2112 1536
Attention Heads 16 16
KV Heads 8 8
Global KV Heads 2 2
Head Dimension 256 128
Global Head Dimension 512 256
MoE
Num Experts 128 16
Top-K Experts 8 8
MoE Intermediate Size 704 512
Vision Model
Hidden Layers 27 6
Hidden Size 1152 768
Intermediate Size 4304 2048
Attention Heads 16 12
KV Heads 16 12
Head Dimension 72 64
Global Head Dimension 72 64
Common
Vocab Size 262144 262144
Max Position Embeddings 262144 (text), 131072 (vision) 262144 (text), 131072 (vision)

Checkpoint Structure

The model is saved as a single safetensors file (model.safetensors) containing all weights. The architecture maintains the same structure as the original Gemma4 model with:

  • Vision embedding projection
  • Language model with text layers (MoE + standard MLP)
  • Mixed attention types (sliding_attention for local context, full_attention for global context)
  • Router for MoE expert selection

Validation

The model has been validated to:

  • Load successfully with AutoModelForCausalLM.from_pretrained()
  • Achieve low perplexity on training data (0.95)
  • Generate coherent text completions
  • Inference perplexity: 1.21

Fine-tuning Results

The model was fine-tuned on a toy dataset of internet copypastas:

  • Initial perplexity: ~12.5
  • Final perplexity: 0.95 (target: 3.0)
  • Training steps: 200
  • The model successfully memorized the training data and generates appropriate completions

Example Generation

Prompt: "According to all known laws"
Output: "According to all known laws of aviation, there is no way a bee should be able to fly. Its wings are too small to get its fat little body off the ground. The bee, of course, flies anyway because bees don't care what humans think is impossible."

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained(
    "inference-optimization/Gemma4-26B-1.1B-tiny",
    trust_remote_code=True,
    device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained(
    "inference-optimization/Gemma4-26B-1.1B-tiny",
    trust_remote_code=True
)

prompt = "According to all known laws"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_length=100)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Additional Notes

  • This is a tiny model for testing purposes only - it is not trained for production use
  • The model maintains the architectural characteristics of Gemma4 including:
    • Mixture of Experts (MoE) with sparse activation
    • Mixed attention patterns (sliding + full attention)
    • Vision-language capabilities (though vision components are not fine-tuned)
  • Useful for:
    • Testing quantization and compression techniques
    • Validating transformers integration
    • Development and debugging without large model overhead
  • The model uses the same tokenizer and processor as the base model
  • Vision capabilities are present but not validated in fine-tuning (text-only dataset used)

License

Same as the base model: Gemma License

Created With

This model was created using the llm-compressor create-tiny-model skill.

Downloads last month
10
Safetensors
Model size
1B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for inference-optimization/gemma-4-1B-0.8B-tiny

Quantizations
1 model