GigaChat3.1-10B-A1.8B — MLX 4-bit

First MLX conversion of Sber's GigaChat 3.1. DeepSeek V3 MoE architecture running natively on Apple Silicon.

Specs

Metric Value
Total params 10B
Active params 1.8B (4 of 64 experts per token)
Architecture DeepseekV3ForCausalLM (MoE)
Layers 26
Hidden size 1536
Attention heads 32
Context 262,144 tokens
Quantization 4-bit (group_size=64, 4.5 bits/weight)
Size on disk 5.6 GB
Speed 116 tok/s on M3 Ultra
Peak memory 5.6 GB
Languages English, Russian

Usage

pip install mlx-lm

# Quick generate
mlx_lm.generate --model RockTalk/GigaChat3.1-10B-A1.8B-MLX-4bit --prompt "Explain gradient descent:"

# Chat
mlx_lm.chat --model RockTalk/GigaChat3.1-10B-A1.8B-MLX-4bit
from mlx_lm import load, generate

model, tokenizer = load("RockTalk/GigaChat3.1-10B-A1.8B-MLX-4bit")

messages = [{"role": "user", "content": "What is LoRA fine-tuning?"}]
prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
response = generate(model, tokenizer, prompt=prompt, max_tokens=500)
print(response)

Conversion Notes

  • Converted from ai-sage/GigaChat3.1-10B-A1.8B-bf16
  • Multi-token prediction (MTP) head stripped for mlx_lm compatibility (num_nextn_predict_layers set to 0, layer 26 weights removed)
  • Tokenizer regex warning is cosmetic and does not affect generation quality
  • Quantized with mlx_lm.convert --quantize --q-bits 4 --q-group-size 64

Benchmarks

Tested on M3 Ultra (512GB):

Test Result
Coherent generation PASS
Code generation PASS
Technical Q&A (MLOps) PASS
Reasoning puzzles PASS (both trick questions correct)
Russian language PASS (fluent)
Safety refusal PASS
Speed > 80 tok/s PASS (116 tok/s)
Memory < 10 GB PASS (5.6 GB)
No degeneration PASS

32/32 validation tests passed before upload.

About GigaChat

GigaChat is developed by Sber (Russia's largest bank) through their AI lab ai-sage. It uses the DeepSeek V3 MoE architecture — 64 routed experts with 4 active per token, plus 1 shared expert. The 10B variant is their efficient model, designed for fast inference with minimal memory.

Converted by RockTalk.

Downloads last month
150
Safetensors
Model size
11B params
Tensor type
BF16
·
U32
·
MLX
Hardware compatibility
Log In to add your hardware

4-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for RockTalk/GigaChat3.1-10B-A1.8B-MLX-4bit

Quantized
(4)
this model