Instructions to use RockTalk/GigaChat3.1-10B-A1.8B-MLX-4bit with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use RockTalk/GigaChat3.1-10B-A1.8B-MLX-4bit with MLX:

# Make sure mlx-lm is installed
# pip install --upgrade mlx-lm

# Generate text with mlx-lm
from mlx_lm import load, generate

model, tokenizer = load("RockTalk/GigaChat3.1-10B-A1.8B-MLX-4bit")

prompt = "Write a story about Einstein"
messages = [{"role": "user", "content": prompt}]
prompt = tokenizer.apply_chat_template(
    messages, add_generation_prompt=True
)

text = generate(model, tokenizer, prompt=prompt, verbose=True)

Notebooks
Google Colab
Kaggle
Local Apps
LM Studio

Pi new

How to use RockTalk/GigaChat3.1-10B-A1.8B-MLX-4bit with Pi:

Start the MLX server

# Install MLX LM:
uv tool install mlx-lm
# Start a local OpenAI-compatible server:
mlx_lm.server --model "RockTalk/GigaChat3.1-10B-A1.8B-MLX-4bit"

Configure the model in Pi

# Install Pi:
npm install -g @mariozechner/pi-coding-agent
# Add to ~/.pi/agent/models.json:
{
  "providers": {
    "mlx-lm": {
      "baseUrl": "http://localhost:8080/v1",
      "api": "openai-completions",
      "apiKey": "none",
      "models": [
        {
          "id": "RockTalk/GigaChat3.1-10B-A1.8B-MLX-4bit"
        }
      ]
    }
  }
}

Run Pi

# Start Pi in your project directory:
pi

Hermes Agent new

How to use RockTalk/GigaChat3.1-10B-A1.8B-MLX-4bit with Hermes Agent:

Start the MLX server

# Install MLX LM:
uv tool install mlx-lm
# Start a local OpenAI-compatible server:
mlx_lm.server --model "RockTalk/GigaChat3.1-10B-A1.8B-MLX-4bit"

Configure Hermes

# Install Hermes:
curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash
hermes setup
# Point Hermes at the local server:
hermes config set model.provider custom
hermes config set model.base_url http://127.0.0.1:8080/v1
hermes config set model.default RockTalk/GigaChat3.1-10B-A1.8B-MLX-4bit

Run Hermes

hermes

MLX LM

How to use RockTalk/GigaChat3.1-10B-A1.8B-MLX-4bit with MLX LM:

Generate or start a chat session

# Install MLX LM
uv tool install mlx-lm
# Interactive chat REPL
mlx_lm.chat --model "RockTalk/GigaChat3.1-10B-A1.8B-MLX-4bit"

Run an OpenAI-compatible server

# Install MLX LM
uv tool install mlx-lm
# Start the server
mlx_lm.server --model "RockTalk/GigaChat3.1-10B-A1.8B-MLX-4bit"
# Calling the OpenAI-compatible server with curl
curl -X POST "http://localhost:8000/v1/chat/completions" \
   -H "Content-Type: application/json" \
   --data '{
     "model": "RockTalk/GigaChat3.1-10B-A1.8B-MLX-4bit",
     "messages": [
       {"role": "user", "content": "Hello"}
     ]
   }'

GigaChat3.1-10B-A1.8B — MLX 4-bit

First MLX conversion of Sber's GigaChat 3.1. DeepSeek V3 MoE architecture running natively on Apple Silicon.

Specs

Metric	Value
Total params	10B
Active params	1.8B (4 of 64 experts per token)
Architecture	DeepseekV3ForCausalLM (MoE)
Layers	26
Hidden size	1536
Attention heads	32
Context	262,144 tokens
Quantization	4-bit (group_size=64, 4.5 bits/weight)
Size on disk	5.6 GB
Speed	116 tok/s on M3 Ultra
Peak memory	5.6 GB
Languages	English, Russian

Usage

pip install mlx-lm

# Quick generate
mlx_lm.generate --model RockTalk/GigaChat3.1-10B-A1.8B-MLX-4bit --prompt "Explain gradient descent:"

# Chat
mlx_lm.chat --model RockTalk/GigaChat3.1-10B-A1.8B-MLX-4bit

from mlx_lm import load, generate

model, tokenizer = load("RockTalk/GigaChat3.1-10B-A1.8B-MLX-4bit")

messages = [{"role": "user", "content": "What is LoRA fine-tuning?"}]
prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
response = generate(model, tokenizer, prompt=prompt, max_tokens=500)
print(response)

Conversion Notes

Converted from ai-sage/GigaChat3.1-10B-A1.8B-bf16
Multi-token prediction (MTP) head stripped for mlx_lm compatibility (num_nextn_predict_layers set to 0, layer 26 weights removed)
Tokenizer regex warning is cosmetic and does not affect generation quality
Quantized with mlx_lm.convert --quantize --q-bits 4 --q-group-size 64

Benchmarks

Tested on M3 Ultra (512GB):

Test	Result
Coherent generation	PASS
Code generation	PASS
Technical Q&A (MLOps)	PASS
Reasoning puzzles	PASS (both trick questions correct)
Russian language	PASS (fluent)
Safety refusal	PASS
Speed > 80 tok/s	PASS (116 tok/s)
Memory < 10 GB	PASS (5.6 GB)
No degeneration	PASS

32/32 validation tests passed before upload.

About GigaChat

GigaChat is developed by Sber (Russia's largest bank) through their AI lab ai-sage. It uses the DeepSeek V3 MoE architecture — 64 routed experts with 4 active per token, plus 1 shared expert. The 10B variant is their efficient model, designed for fast inference with minimal memory.

Converted by RockTalk.

Downloads last month: 150

Safetensors

Model size

11B params

Tensor type

BF16

U32

MLX

Hardware compatibility

4-bit

Model tree for RockTalk/GigaChat3.1-10B-A1.8B-MLX-4bit

Base model

ai-sage/GigaChat3-10B-A1.8B-base

Finetuned

ai-sage/GigaChat3.1-10B-A1.8B-bf16

Quantized

(4)

this model