Instructions to use Ahmad-Abdo-Shbat/TIA2.1-14B-GGUF with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- llama-cpp-python
How to use Ahmad-Abdo-Shbat/TIA2.1-14B-GGUF with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="Ahmad-Abdo-Shbat/TIA2.1-14B-GGUF", filename="tia2.1-14b-q8_0.gguf", )
llm.create_chat_completion( messages = [ { "role": "user", "content": "What is the capital of France?" } ] ) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- llama.cpp
How to use Ahmad-Abdo-Shbat/TIA2.1-14B-GGUF with llama.cpp:
Install from brew
brew install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf Ahmad-Abdo-Shbat/TIA2.1-14B-GGUF:Q8_0 # Run inference directly in the terminal: llama-cli -hf Ahmad-Abdo-Shbat/TIA2.1-14B-GGUF:Q8_0
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf Ahmad-Abdo-Shbat/TIA2.1-14B-GGUF:Q8_0 # Run inference directly in the terminal: llama-cli -hf Ahmad-Abdo-Shbat/TIA2.1-14B-GGUF:Q8_0
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf Ahmad-Abdo-Shbat/TIA2.1-14B-GGUF:Q8_0 # Run inference directly in the terminal: ./llama-cli -hf Ahmad-Abdo-Shbat/TIA2.1-14B-GGUF:Q8_0
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf Ahmad-Abdo-Shbat/TIA2.1-14B-GGUF:Q8_0 # Run inference directly in the terminal: ./build/bin/llama-cli -hf Ahmad-Abdo-Shbat/TIA2.1-14B-GGUF:Q8_0
Use Docker
docker model run hf.co/Ahmad-Abdo-Shbat/TIA2.1-14B-GGUF:Q8_0
- LM Studio
- Jan
- vLLM
How to use Ahmad-Abdo-Shbat/TIA2.1-14B-GGUF with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "Ahmad-Abdo-Shbat/TIA2.1-14B-GGUF" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Ahmad-Abdo-Shbat/TIA2.1-14B-GGUF", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/Ahmad-Abdo-Shbat/TIA2.1-14B-GGUF:Q8_0
- Ollama
How to use Ahmad-Abdo-Shbat/TIA2.1-14B-GGUF with Ollama:
ollama run hf.co/Ahmad-Abdo-Shbat/TIA2.1-14B-GGUF:Q8_0
- Unsloth Studio new
How to use Ahmad-Abdo-Shbat/TIA2.1-14B-GGUF with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for Ahmad-Abdo-Shbat/TIA2.1-14B-GGUF to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for Ahmad-Abdo-Shbat/TIA2.1-14B-GGUF to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for Ahmad-Abdo-Shbat/TIA2.1-14B-GGUF to start chatting
- Pi new
How to use Ahmad-Abdo-Shbat/TIA2.1-14B-GGUF with Pi:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf Ahmad-Abdo-Shbat/TIA2.1-14B-GGUF:Q8_0
Configure the model in Pi
# Install Pi: npm install -g @mariozechner/pi-coding-agent # Add to ~/.pi/agent/models.json: { "providers": { "llama-cpp": { "baseUrl": "http://localhost:8080/v1", "api": "openai-completions", "apiKey": "none", "models": [ { "id": "Ahmad-Abdo-Shbat/TIA2.1-14B-GGUF:Q8_0" } ] } } }Run Pi
# Start Pi in your project directory: pi
- Hermes Agent new
How to use Ahmad-Abdo-Shbat/TIA2.1-14B-GGUF with Hermes Agent:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf Ahmad-Abdo-Shbat/TIA2.1-14B-GGUF:Q8_0
Configure Hermes
# Install Hermes: curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash hermes setup # Point Hermes at the local server: hermes config set model.provider custom hermes config set model.base_url http://127.0.0.1:8080/v1 hermes config set model.default Ahmad-Abdo-Shbat/TIA2.1-14B-GGUF:Q8_0
Run Hermes
hermes
- Docker Model Runner
How to use Ahmad-Abdo-Shbat/TIA2.1-14B-GGUF with Docker Model Runner:
docker model run hf.co/Ahmad-Abdo-Shbat/TIA2.1-14B-GGUF:Q8_0
- Lemonade
How to use Ahmad-Abdo-Shbat/TIA2.1-14B-GGUF with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull Ahmad-Abdo-Shbat/TIA2.1-14B-GGUF:Q8_0
Run and chat with the model
lemonade run user.TIA2.1-14B-GGUF-Q8_0
List all available models
lemonade list
TIA2.1:14B โ GGUF (Q8_0)
TIA2.1:14B is a 14-billion-parameter language model specializing in reverse engineering, binary analysis, exploit development, and cybersecurity. Built on top of Qwen/Qwen2.5-Coder-14B (base, non-instruct) through continual pre-training (CPT) and supervised fine-tuning (SFT) using QLoRA.
Created by Ahmad Abdo Shbat
Key Features
- Deep Reasoning โ Every response includes step-by-step reasoning inside
<think>...</think>tags before the final answer, enabling transparent chain-of-thought. - Reverse Engineering Expertise โ Trained on 280K+ assembly/disassembly records (IDA Pro output), architecture manuals, exploit databases, CVEs, CTF writeups, security research papers, and tool documentation.
- Interactive Widgets โ Can emit live HTML/CSS/JS visualizations (memory layouts, ROP chain steppers, opcode maps, encoding converters) inside
```tia-widgetcode blocks for rich interactive explanations. - Clarifying Questions โ Uses
<options>/<options multi>tags to ask structured single-select or multi-select clarifying questions when requests are ambiguous. - Deep Search Integration โ Designed to work with search-augmented generation; cites sources from
<deep_search_results>context using[N]references. - Bilingual โ Fluent in English and Arabic.
Capabilities & Domain Coverage
Core Domains
- Binary Analysis: PE, ELF, Mach-O, DEX, WebAssembly, DWARF debug info
- Disassembly & Decompilation: IDA Pro, Ghidra, Binary Ninja, radare2
- Exploit Development: Stack overflow, heap exploitation (ptmalloc2, tcache, fastbin), ROP chains, ret2libc, format strings, UAF, SROP, kernel exploits
- Malware Analysis: Unpacking, anti-analysis techniques, C2 protocols, shellcode analysis
- Vulnerability Research: CVE analysis, fuzzing (AFL, libFuzzer), bug hunting, patch diffing
- Cryptography: AES, RSA, elliptic curves, hash functions, protocol analysis
- Operating Systems: Windows internals (PEB/TEB, SEH, ETW, WNF), Linux kernel, macOS security
- Networking & Web Security: TLS, DNS, HTTP smuggling, CORS, SSTI, XXE, JWT, OAuth, CSP bypass
- Dynamic Analysis: GDB, WinDbg, Frida, Unicorn, angr, DynamoRIO
General Programming
- Strong general coding ability inherited from Qwen2.5-Coder-14B base
- Python, C/C++, Rust, Assembly (x86, ARM, MIPS, RISC-V), JavaScript, and more
File Details
| File | Quantization | Size |
|---|---|---|
tia2.1-14b-q8_0.gguf |
Q8_0 | ~15 GB |
How to Use
With Ollama
# Create a Modelfile
cat > Modelfile << 'EOF'
FROM ./tia2.1-14b-q8_0.gguf
TEMPLATE """{{- if .System }}<|im_start|>system
{{ .System }}<|im_end|>
{{ end }}{{- range .Messages }}<|im_start|>{{ .Role }}
{{ .Content }}<|im_end|>
{{ end }}<|im_start|>assistant
"""
PARAMETER stop "<|im_end|>"
PARAMETER stop "<|im_start|>"
PARAMETER stop "<|endoftext|>"
PARAMETER num_ctx 4096
PARAMETER temperature 0.6
PARAMETER top_p 0.9
PARAMETER repeat_penalty 1.05
EOF
# Create and run
ollama create tia2.1:14b -f Modelfile
ollama run tia2.1:14b
With llama.cpp
./llama-cli -m tia2.1-14b-q8_0.gguf \
--ctx-size 4096 \
--temp 0.6 \
--top-p 0.9 \
--repeat-penalty 1.05 \
-p "<|im_start|>user\nExplain how a ROP chain bypasses DEP on x86-64<|im_end|>\n<|im_start|>assistant\n"
System Prompt (Recommended)
You are a conversational chat assistant. Think step-by-step inside <think>...</think> before every answer. Reply in the same language the user writes in (English or Arabic). Be concise, helpful, and accurate. Answer all questions fully and directly.
Hardware Requirements
| Quantization | VRAM (approx) | RAM (approx) |
|---|---|---|
| Q8_0 | ~16 GB | ~17 GB |
Runs on a single GPU with 16+ GB VRAM (e.g., RTX 4080, RTX 5070 Ti, RTX 3090, A5000).
Limitations
- Optimized for English and Arabic; other languages may produce lower-quality output
- Context window tested at 4096 tokens; longer contexts are possible but untested for quality
- Widget output (
tia-widget) requires a compatible frontend to render interactive visualizations - Deep search citation format (
[N]) requires a search-augmented pipeline to provide<deep_search_results>context
License
This model is released under the Apache 2.0 License, consistent with the Qwen2.5 base model license.
Acknowledgments
- Downloads last month
- 6
8-bit