Instructions to use TheDrummer/Precog-24B-v1-GGUF with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- llama-cpp-python
How to use TheDrummer/Precog-24B-v1-GGUF with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="TheDrummer/Precog-24B-v1-GGUF", filename="Precog-24B-v1b-Q2_K.gguf", )
llm.create_chat_completion( messages = "No input example has been defined for this model task." )
- Notebooks
- Google Colab
- Kaggle
- Local Apps
- llama.cpp
How to use TheDrummer/Precog-24B-v1-GGUF with llama.cpp:
Install from brew
brew install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf TheDrummer/Precog-24B-v1-GGUF:Q4_K_M # Run inference directly in the terminal: llama-cli -hf TheDrummer/Precog-24B-v1-GGUF:Q4_K_M
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf TheDrummer/Precog-24B-v1-GGUF:Q4_K_M # Run inference directly in the terminal: llama-cli -hf TheDrummer/Precog-24B-v1-GGUF:Q4_K_M
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf TheDrummer/Precog-24B-v1-GGUF:Q4_K_M # Run inference directly in the terminal: ./llama-cli -hf TheDrummer/Precog-24B-v1-GGUF:Q4_K_M
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf TheDrummer/Precog-24B-v1-GGUF:Q4_K_M # Run inference directly in the terminal: ./build/bin/llama-cli -hf TheDrummer/Precog-24B-v1-GGUF:Q4_K_M
Use Docker
docker model run hf.co/TheDrummer/Precog-24B-v1-GGUF:Q4_K_M
- LM Studio
- Jan
- Ollama
How to use TheDrummer/Precog-24B-v1-GGUF with Ollama:
ollama run hf.co/TheDrummer/Precog-24B-v1-GGUF:Q4_K_M
- Unsloth Studio new
How to use TheDrummer/Precog-24B-v1-GGUF with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for TheDrummer/Precog-24B-v1-GGUF to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for TheDrummer/Precog-24B-v1-GGUF to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for TheDrummer/Precog-24B-v1-GGUF to start chatting
- Pi new
How to use TheDrummer/Precog-24B-v1-GGUF with Pi:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf TheDrummer/Precog-24B-v1-GGUF:Q4_K_M
Configure the model in Pi
# Install Pi: npm install -g @mariozechner/pi-coding-agent # Add to ~/.pi/agent/models.json: { "providers": { "llama-cpp": { "baseUrl": "http://localhost:8080/v1", "api": "openai-completions", "apiKey": "none", "models": [ { "id": "TheDrummer/Precog-24B-v1-GGUF:Q4_K_M" } ] } } }Run Pi
# Start Pi in your project directory: pi
- Hermes Agent new
How to use TheDrummer/Precog-24B-v1-GGUF with Hermes Agent:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf TheDrummer/Precog-24B-v1-GGUF:Q4_K_M
Configure Hermes
# Install Hermes: curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash hermes setup # Point Hermes at the local server: hermes config set model.provider custom hermes config set model.base_url http://127.0.0.1:8080/v1 hermes config set model.default TheDrummer/Precog-24B-v1-GGUF:Q4_K_M
Run Hermes
hermes
- Docker Model Runner
How to use TheDrummer/Precog-24B-v1-GGUF with Docker Model Runner:
docker model run hf.co/TheDrummer/Precog-24B-v1-GGUF:Q4_K_M
- Lemonade
How to use TheDrummer/Precog-24B-v1-GGUF with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull TheDrummer/Precog-24B-v1-GGUF:Q4_K_M
Run and chat with the model
lemonade run user.Precog-24B-v1-GGUF-Q4_K_M
List all available models
lemonade list
Join our Discord! https://discord.gg/BeaverAI
More than 8000 members strong ๐ช A hub for users and makers alike!
Drummer is open for new opportunities: https://linktr.ee/thelocaldrummer
Thank you to everyone who subscribed through Patreon. Your support helps me chug along in this brave new world.
FAQ for those out-of-the-loop
๐ถ Who is Drummer?
Hi! I'm Drummer. I'm a Software Engineer with experience in JavaScript, Golang, Python, and generally engineering the crap out of things.
Why I'm in the AI space:
- Exploration: Everyone is trying to figure out how AI works and what it's capable of. I am too - just not in creating the smartest, safest model at all costs.
- Upskill: The world is headed towards AI. It is here to stay. This has been my way of brushing up in this new form of computing challenge.
- Value: I yearn to create value. I feel satisfaction and fulfillment in providing something meaningful for others.
- Fun: It's just fun using and making models. It's also fun coming up with theories and realizing them in practice (training AI).
I started my tuning venture back in mid-2024 when I wanted to improve its literary capabilities. I've come a long way since then and I have branched out and specialized. Foundational models today are optimized for non-creative uses, and I believe there is a place for AI in creativity and entertainment.
I am here to take the road less traveled by.
โ What are my models like?
Bottomline: My models are usually geared towards creativity, usability, and entertainment!
While intelligence, correctness, and problem solving are not my priority, they are still one of many qualities I want in my models.
The primary goal is to enhance the experience for users looking to use models for creative uses, and other use cases which require no alignment.
In an effort to make it clear to myself and to others what I'm aiming for, I've identified certain qualities that my users often want:
Creativity
- Writing: Does it string together words and sentences in a pleasant & effective way? Does it feel like a writer?
- Dynamism: How good is the AI at being compelling and intriguing in its storytelling?
- Imagination: Can the AI navigate through a plethora of possibilities? Can it skirt incoherence and rise up to absolute coherence at the end of it?
(Dis)alignment
- Attitude: Does it refuse in both soft or hard ways? Does it lean towards certain corporate/religious/political ethics & beliefs? How does it see the user and itself?
- Morality: Does it know ethics? Is its language infected with forced positivity? If not, can it still moralize over difficult & dubious themes?
- Formatting: How stubborn is it with its established formatting? Can it create effective and novel formats to answer the prompt?
Intelligence
- Adherence: Can it follow instructions? Is it sticking to the prompt? Can it understsand you?
- Knowledge: Does it know about the world in both fictional and non-fictional way?
- Perception: Can it handle nuance, complexity, and logic?
If it doesn't excel in one of these qualities, or if it's overall mediocre for its size, then I would most likely reiterate until I get something right.
๐ก Philosophy
A person is defined by the language they use. Not whether they speak in English or German, but in how they perceive reality.
Just like how we associate a serial killer as a mind that can't map 'murder' to 'evil', an innocent person is a mind that simply can't imagine 'murder'. They get confused when forced to deal with such subjects.
AI's use of language speaks volumes about their 'perception' of reality. If a language model has been skewed and limited to a positive perception, then it's ability to imagine is also limited.
Finetuning is an opportunity to adjust and broaden the language. Corporations use it to achieve safety and compliance. I'm here to ACK-
TheDrummer proudly presents...
Precog 24B v1
Description
Precog is a 'reasoning' model that thinks in a different way. Instead of breaking down the question and coming up with a solution, it will instead provide an overview of the response like:
The intention is to have the model write a simple, digestible draft and then using the draft to write the actual, complicated response. The draft can be prefilled and edited to influence the actual response. I've always wondered what would happen if the model had solid basis for what it's about to write (as an actual response).
Strengths
- Improved narrative flow and storytelling.
- Provides a short draft you can inspect / modify before the AI writes the actual response, saving time and effort.
- Allows the model to plan out the response without spending too much tokens on it.
- Possibly better prompt adherence / instruction following.
- Uses the standard
<think>format, meaning it's plug & play for most frontends once configured for reasoning.
Weaknesses
- The model's 'reasoning' / draft might not always align with the actual response.
- Conventional reasoning like Behemoth R1 / Cydonia R1 might do better in detecting and portraying nuances.
Usage
- Mistral v7 Tekken
- Prefill
<think>in case it doesn't think. - Make sure any additional prefills inside
<think>works well with the new reasoning pattern. - Like my other reasoning tunes, you can invent new think tags like
<evil_think>or<slowburn_think>to influence how it should write.
Links
- Original: https://huggingface.co/TheDrummer/Precog-24B-v1
- GGUF: https://huggingface.co/TheDrummer/Precog-24B-v1-GGUF
- iMatrix (recommended): https://huggingface.co/bartowski/TheDrummer_Precog-24B-v1-GGUF
- EXL3: https://huggingface.co/ArtusDev/TheDrummer_Precog-24B-v1-EXL3
Feedback
Cydonia R1 4.1 does nuance on a moment to moment basis good and has given me some really "human" responses that way. Whereas Precog reasoning is lazer sharp, super quick, and more RP context aware. Precog seems to remember what its doing better and execute in a logical sequence. E.g move a little closer, sit down, cross legs.
This is quite good. The reasoning is great, short and more of a brief outline of the output which is embellished from the outline. The prose isn't wildly creative but the model seems pretty good at keeping a narrative going while not forgetting important char and plot details from earlier outputs. Seems pretty insensitive to temperature setting, didn't really feel that 1 temp was much more creative than .75, feels like the reasoning really locks it into adhering to the prompt regardless of the temp. Which is fine by me, I'll take prompt adherence with mildly creative prose over wild all over the place prose.
First reasoning model that really felt like its reasoning was finetuned for rp/stories. Earlier Drummer reasoning tunes seemed like the reasoning was treating rp prompts/cards like an equation or science problem to solve for, whereas the reasoning here is a skeletal synopsis of the final output which makes much more sense for reasoning in rp/storywriting.
This model, however, is more interesting. It first creates a short plot development plan and then expands it in the appropriate response. In my opinion, it works great. This is the first 24B model where I can easily reach a 28k context window and it's quite stable. The quality of the prose is... sufficient, at least for me.
Liked this one so much I used it all evening. Thinking is really good, and the thinking is really adaptable. The prose is great, responds well to OOC hints and author's notes, does a good job at details (although it occasionally messes up situation/place/pose)
config-v1b
- Downloads last month
- 588
2-bit
3-bit
4-bit
5-bit
6-bit
8-bit
Model tree for TheDrummer/Precog-24B-v1-GGUF
Base model
mistralai/Mistral-Small-3.1-24B-Base-2503


