Instructions to use Ahmad-Abdo-Shbat/TIA2.1-14B-GGUF with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use Ahmad-Abdo-Shbat/TIA2.1-14B-GGUF with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="Ahmad-Abdo-Shbat/TIA2.1-14B-GGUF",
	filename="tia2.1-14b-q8_0.gguf",
)

llm.create_chat_completion(
	messages = [
		{
			"role": "user",
			"content": "What is the capital of France?"
		}
	]
)

Notebooks
Google Colab
Kaggle
Local Apps

llama.cpp

How to use Ahmad-Abdo-Shbat/TIA2.1-14B-GGUF with llama.cpp:

Install from brew

brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf Ahmad-Abdo-Shbat/TIA2.1-14B-GGUF:Q8_0
# Run inference directly in the terminal:
llama-cli -hf Ahmad-Abdo-Shbat/TIA2.1-14B-GGUF:Q8_0

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf Ahmad-Abdo-Shbat/TIA2.1-14B-GGUF:Q8_0
# Run inference directly in the terminal:
llama-cli -hf Ahmad-Abdo-Shbat/TIA2.1-14B-GGUF:Q8_0

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf Ahmad-Abdo-Shbat/TIA2.1-14B-GGUF:Q8_0
# Run inference directly in the terminal:
./llama-cli -hf Ahmad-Abdo-Shbat/TIA2.1-14B-GGUF:Q8_0

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf Ahmad-Abdo-Shbat/TIA2.1-14B-GGUF:Q8_0
# Run inference directly in the terminal:
./build/bin/llama-cli -hf Ahmad-Abdo-Shbat/TIA2.1-14B-GGUF:Q8_0

Use Docker

docker model run hf.co/Ahmad-Abdo-Shbat/TIA2.1-14B-GGUF:Q8_0

LM Studio
Jan

vLLM

How to use Ahmad-Abdo-Shbat/TIA2.1-14B-GGUF with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "Ahmad-Abdo-Shbat/TIA2.1-14B-GGUF"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Ahmad-Abdo-Shbat/TIA2.1-14B-GGUF",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/Ahmad-Abdo-Shbat/TIA2.1-14B-GGUF:Q8_0

Ollama
How to use Ahmad-Abdo-Shbat/TIA2.1-14B-GGUF with Ollama:
```
ollama run hf.co/Ahmad-Abdo-Shbat/TIA2.1-14B-GGUF:Q8_0
```

Unsloth Studio new

How to use Ahmad-Abdo-Shbat/TIA2.1-14B-GGUF with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for Ahmad-Abdo-Shbat/TIA2.1-14B-GGUF to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for Ahmad-Abdo-Shbat/TIA2.1-14B-GGUF to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for Ahmad-Abdo-Shbat/TIA2.1-14B-GGUF to start chatting

Pi new

How to use Ahmad-Abdo-Shbat/TIA2.1-14B-GGUF with Pi:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf Ahmad-Abdo-Shbat/TIA2.1-14B-GGUF:Q8_0

Configure the model in Pi

# Install Pi:
npm install -g @mariozechner/pi-coding-agent
# Add to ~/.pi/agent/models.json:
{
  "providers": {
    "llama-cpp": {
      "baseUrl": "http://localhost:8080/v1",
      "api": "openai-completions",
      "apiKey": "none",
      "models": [
        {
          "id": "Ahmad-Abdo-Shbat/TIA2.1-14B-GGUF:Q8_0"
        }
      ]
    }
  }
}

Run Pi

# Start Pi in your project directory:
pi

Hermes Agent new

How to use Ahmad-Abdo-Shbat/TIA2.1-14B-GGUF with Hermes Agent:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf Ahmad-Abdo-Shbat/TIA2.1-14B-GGUF:Q8_0

Configure Hermes

# Install Hermes:
curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash
hermes setup
# Point Hermes at the local server:
hermes config set model.provider custom
hermes config set model.base_url http://127.0.0.1:8080/v1
hermes config set model.default Ahmad-Abdo-Shbat/TIA2.1-14B-GGUF:Q8_0

Run Hermes

hermes

Docker Model Runner
How to use Ahmad-Abdo-Shbat/TIA2.1-14B-GGUF with Docker Model Runner:
```
docker model run hf.co/Ahmad-Abdo-Shbat/TIA2.1-14B-GGUF:Q8_0
```

Lemonade

How to use Ahmad-Abdo-Shbat/TIA2.1-14B-GGUF with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull Ahmad-Abdo-Shbat/TIA2.1-14B-GGUF:Q8_0

Run and chat with the model

lemonade run user.TIA2.1-14B-GGUF-Q8_0

List all available models

lemonade list

TIA2.1:14B — GGUF (Q8_0)

TIA2.1:14B is a 14-billion-parameter language model specializing in reverse engineering, binary analysis, exploit development, and cybersecurity. Built on top of Qwen/Qwen2.5-Coder-14B (base, non-instruct) through continual pre-training (CPT) and supervised fine-tuning (SFT) using QLoRA.

Created by Ahmad Abdo Shbat

Key Features

Deep Reasoning — Every response includes step-by-step reasoning inside <think>...</think> tags before the final answer, enabling transparent chain-of-thought.
Reverse Engineering Expertise — Trained on 280K+ assembly/disassembly records (IDA Pro output), architecture manuals, exploit databases, CVEs, CTF writeups, security research papers, and tool documentation.
Interactive Widgets — Can emit live HTML/CSS/JS visualizations (memory layouts, ROP chain steppers, opcode maps, encoding converters) inside ```tia-widget code blocks for rich interactive explanations.
Clarifying Questions — Uses <options> / <options multi> tags to ask structured single-select or multi-select clarifying questions when requests are ambiguous.
Deep Search Integration — Designed to work with search-augmented generation; cites sources from <deep_search_results> context using [N] references.
Bilingual — Fluent in English and Arabic.

Capabilities & Domain Coverage

Core Domains

Binary Analysis: PE, ELF, Mach-O, DEX, WebAssembly, DWARF debug info
Disassembly & Decompilation: IDA Pro, Ghidra, Binary Ninja, radare2
Exploit Development: Stack overflow, heap exploitation (ptmalloc2, tcache, fastbin), ROP chains, ret2libc, format strings, UAF, SROP, kernel exploits
Malware Analysis: Unpacking, anti-analysis techniques, C2 protocols, shellcode analysis
Vulnerability Research: CVE analysis, fuzzing (AFL, libFuzzer), bug hunting, patch diffing
Cryptography: AES, RSA, elliptic curves, hash functions, protocol analysis
Operating Systems: Windows internals (PEB/TEB, SEH, ETW, WNF), Linux kernel, macOS security
Networking & Web Security: TLS, DNS, HTTP smuggling, CORS, SSTI, XXE, JWT, OAuth, CSP bypass
Dynamic Analysis: GDB, WinDbg, Frida, Unicorn, angr, DynamoRIO

General Programming

Strong general coding ability inherited from Qwen2.5-Coder-14B base
Python, C/C++, Rust, Assembly (x86, ARM, MIPS, RISC-V), JavaScript, and more

File Details

File	Quantization	Size
`tia2.1-14b-q8_0.gguf`	Q8_0	~15 GB

How to Use

With Ollama

# Create a Modelfile
cat > Modelfile << 'EOF'
FROM ./tia2.1-14b-q8_0.gguf

TEMPLATE """{{- if .System }}<|im_start|>system
{{ .System }}<|im_end|>
{{ end }}{{- range .Messages }}<|im_start|>{{ .Role }}
{{ .Content }}<|im_end|>
{{ end }}<|im_start|>assistant
"""

PARAMETER stop "<|im_end|>"
PARAMETER stop "<|im_start|>"
PARAMETER stop "<|endoftext|>"
PARAMETER num_ctx 4096
PARAMETER temperature 0.6
PARAMETER top_p 0.9
PARAMETER repeat_penalty 1.05
EOF

# Create and run
ollama create tia2.1:14b -f Modelfile
ollama run tia2.1:14b

With llama.cpp

./llama-cli -m tia2.1-14b-q8_0.gguf \
  --ctx-size 4096 \
  --temp 0.6 \
  --top-p 0.9 \
  --repeat-penalty 1.05 \
  -p "<|im_start|>user\nExplain how a ROP chain bypasses DEP on x86-64<|im_end|>\n<|im_start|>assistant\n"

System Prompt (Recommended)

You are a conversational chat assistant. Think step-by-step inside <think>...</think> before every answer. Reply in the same language the user writes in (English or Arabic). Be concise, helpful, and accurate. Answer all questions fully and directly.

Hardware Requirements

Quantization	VRAM (approx)	RAM (approx)
Q8_0	~16 GB	~17 GB

Runs on a single GPU with 16+ GB VRAM (e.g., RTX 4080, RTX 5070 Ti, RTX 3090, A5000).

Limitations

Optimized for English and Arabic; other languages may produce lower-quality output
Context window tested at 4096 tokens; longer contexts are possible but untested for quality
Widget output (tia-widget) requires a compatible frontend to render interactive visualizations
Deep search citation format ([N]) requires a search-augmented pipeline to provide <deep_search_results> context

License

This model is released under the Apache 2.0 License, consistent with the Qwen2.5 base model license.

Acknowledgments

Qwen Team for the excellent Qwen2.5-Coder-14B base model
Unsloth for efficient QLoRA training

Downloads last month: 6

GGUF

Model size

15B params

Architecture

qwen2

Hardware compatibility

8-bit

Model tree for Ahmad-Abdo-Shbat/TIA2.1-14B-GGUF

Base model

Qwen/Qwen2.5-14B

Finetuned

Qwen/Qwen2.5-Coder-14B

Quantized

(21)

this model