Instructions to use TheDrummer/Precog-24B-v1-GGUF with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use TheDrummer/Precog-24B-v1-GGUF with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="TheDrummer/Precog-24B-v1-GGUF",
	filename="Precog-24B-v1b-Q2_K.gguf",
)

llm.create_chat_completion(
	messages = "No input example has been defined for this model task."
)

Notebooks
Google Colab
Kaggle
Local Apps

llama.cpp

How to use TheDrummer/Precog-24B-v1-GGUF with llama.cpp:

Install from brew

brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf TheDrummer/Precog-24B-v1-GGUF:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf TheDrummer/Precog-24B-v1-GGUF:Q4_K_M

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf TheDrummer/Precog-24B-v1-GGUF:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf TheDrummer/Precog-24B-v1-GGUF:Q4_K_M

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf TheDrummer/Precog-24B-v1-GGUF:Q4_K_M
# Run inference directly in the terminal:
./llama-cli -hf TheDrummer/Precog-24B-v1-GGUF:Q4_K_M

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf TheDrummer/Precog-24B-v1-GGUF:Q4_K_M
# Run inference directly in the terminal:
./build/bin/llama-cli -hf TheDrummer/Precog-24B-v1-GGUF:Q4_K_M

Use Docker

docker model run hf.co/TheDrummer/Precog-24B-v1-GGUF:Q4_K_M

LM Studio
Jan
Ollama
How to use TheDrummer/Precog-24B-v1-GGUF with Ollama:
```
ollama run hf.co/TheDrummer/Precog-24B-v1-GGUF:Q4_K_M
```

Unsloth Studio new

How to use TheDrummer/Precog-24B-v1-GGUF with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for TheDrummer/Precog-24B-v1-GGUF to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for TheDrummer/Precog-24B-v1-GGUF to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for TheDrummer/Precog-24B-v1-GGUF to start chatting

Pi new

How to use TheDrummer/Precog-24B-v1-GGUF with Pi:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf TheDrummer/Precog-24B-v1-GGUF:Q4_K_M

Configure the model in Pi

# Install Pi:
npm install -g @mariozechner/pi-coding-agent
# Add to ~/.pi/agent/models.json:
{
  "providers": {
    "llama-cpp": {
      "baseUrl": "http://localhost:8080/v1",
      "api": "openai-completions",
      "apiKey": "none",
      "models": [
        {
          "id": "TheDrummer/Precog-24B-v1-GGUF:Q4_K_M"
        }
      ]
    }
  }
}

Run Pi

# Start Pi in your project directory:
pi

Hermes Agent new

How to use TheDrummer/Precog-24B-v1-GGUF with Hermes Agent:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf TheDrummer/Precog-24B-v1-GGUF:Q4_K_M

Configure Hermes

# Install Hermes:
curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash
hermes setup
# Point Hermes at the local server:
hermes config set model.provider custom
hermes config set model.base_url http://127.0.0.1:8080/v1
hermes config set model.default TheDrummer/Precog-24B-v1-GGUF:Q4_K_M

Run Hermes

hermes

Docker Model Runner
How to use TheDrummer/Precog-24B-v1-GGUF with Docker Model Runner:
```
docker model run hf.co/TheDrummer/Precog-24B-v1-GGUF:Q4_K_M
```

Lemonade

How to use TheDrummer/Precog-24B-v1-GGUF with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull TheDrummer/Precog-24B-v1-GGUF:Q4_K_M

Run and chat with the model

lemonade run user.Precog-24B-v1-GGUF-Q4_K_M

List all available models

lemonade list

BeaverAI

More than 8000 members strong 💪 A hub for users and makers alike!

Drummer is open for new opportunities: https://linktr.ee/thelocaldrummer

Thank you to everyone who subscribed through Patreon. Your support helps me chug along in this brave new world.

FAQ for those out-of-the-loop

🐶 Who is Drummer?

Hi! I'm Drummer. I'm a Software Engineer with experience in JavaScript, Golang, Python, and generally engineering the crap out of things.

Why I'm in the AI space:

Exploration: Everyone is trying to figure out how AI works and what it's capable of. I am too - just not in creating the smartest, safest model at all costs.
Upskill: The world is headed towards AI. It is here to stay. This has been my way of brushing up in this new form of computing challenge.
Value: I yearn to create value. I feel satisfaction and fulfillment in providing something meaningful for others.
Fun: It's just fun using and making models. It's also fun coming up with theories and realizing them in practice (training AI).

I started my tuning venture back in mid-2024 when I wanted to improve its literary capabilities. I've come a long way since then and I have branched out and specialized. Foundational models today are optimized for non-creative uses, and I believe there is a place for AI in creativity and entertainment.

I am here to take the road less traveled by.

❓ What are my models like?

Bottomline: My models are usually geared towards creativity, usability, and entertainment!

While intelligence, correctness, and problem solving are not my priority, they are still one of many qualities I want in my models.

The primary goal is to enhance the experience for users looking to use models for creative uses, and other use cases which require no alignment.

In an effort to make it clear to myself and to others what I'm aiming for, I've identified certain qualities that my users often want:

Creativity

Writing: Does it string together words and sentences in a pleasant & effective way? Does it feel like a writer?
Dynamism: How good is the AI at being compelling and intriguing in its storytelling?
Imagination: Can the AI navigate through a plethora of possibilities? Can it skirt incoherence and rise up to absolute coherence at the end of it?

(Dis)alignment

Attitude: Does it refuse in both soft or hard ways? Does it lean towards certain corporate/religious/political ethics & beliefs? How does it see the user and itself?
Morality: Does it know ethics? Is its language infected with forced positivity? If not, can it still moralize over difficult & dubious themes?
Formatting: How stubborn is it with its established formatting? Can it create effective and novel formats to answer the prompt?

Intelligence

Adherence: Can it follow instructions? Is it sticking to the prompt? Can it understsand you?
Knowledge: Does it know about the world in both fictional and non-fictional way?
Perception: Can it handle nuance, complexity, and logic?

If it doesn't excel in one of these qualities, or if it's overall mediocre for its size, then I would most likely reiterate until I get something right.

💡 Philosophy

A person is defined by the language they use. Not whether they speak in English or German, but in how they perceive reality.

Just like how we associate a serial killer as a mind that can't map 'murder' to 'evil', an innocent person is a mind that simply can't imagine 'murder'. They get confused when forced to deal with such subjects.

AI's use of language speaks volumes about their 'perception' of reality. If a language model has been skewed and limited to a positive perception, then it's ability to imagine is also limited.

Finetuning is an opportunity to adjust and broaden the language. Corporations use it to achieve safety and compliance. I'm here to ACK-

TheDrummer proudly presents...

Precog 24B v1

Description

Precog is a 'reasoning' model that thinks in a different way. Instead of breaking down the question and coming up with a solution, it will instead provide an overview of the response like:

The intention is to have the model write a simple, digestible draft and then using the draft to write the actual, complicated response. The draft can be prefilled and edited to influence the actual response. I've always wondered what would happen if the model had solid basis for what it's about to write (as an actual response).

Strengths

Improved narrative flow and storytelling.
Provides a short draft you can inspect / modify before the AI writes the actual response, saving time and effort.
Allows the model to plan out the response without spending too much tokens on it.
Possibly better prompt adherence / instruction following.
Uses the standard <think> format, meaning it's plug & play for most frontends once configured for reasoning.

Weaknesses

The model's 'reasoning' / draft might not always align with the actual response.
Conventional reasoning like Behemoth R1 / Cydonia R1 might do better in detecting and portraying nuances.

Usage

Mistral v7 Tekken
Prefill <think> in case it doesn't think.
Make sure any additional prefills inside <think> works well with the new reasoning pattern.
Like my other reasoning tunes, you can invent new think tags like <evil_think> or <slowburn_think> to influence how it should write.

Feedback

Cydonia R1 4.1 does nuance on a moment to moment basis good and has given me some really "human" responses that way. Whereas Precog reasoning is lazer sharp, super quick, and more RP context aware. Precog seems to remember what its doing better and execute in a logical sequence. E.g move a little closer, sit down, cross legs.

This is quite good. The reasoning is great, short and more of a brief outline of the output which is embellished from the outline. The prose isn't wildly creative but the model seems pretty good at keeping a narrative going while not forgetting important char and plot details from earlier outputs. Seems pretty insensitive to temperature setting, didn't really feel that 1 temp was much more creative than .75, feels like the reasoning really locks it into adhering to the prompt regardless of the temp. Which is fine by me, I'll take prompt adherence with mildly creative prose over wild all over the place prose.

First reasoning model that really felt like its reasoning was finetuned for rp/stories. Earlier Drummer reasoning tunes seemed like the reasoning was treating rp prompts/cards like an equation or science problem to solve for, whereas the reasoning here is a skeletal synopsis of the final output which makes much more sense for reasoning in rp/storywriting.

This model, however, is more interesting. It first creates a short plot development plan and then expands it in the appropriate response. In my opinion, it works great. This is the first 24B model where I can easily reach a 28k context window and it's quite stable. The quality of the prose is... sufficient, at least for me.

Liked this one so much I used it all evening. Thinking is really good, and the thinking is really adaptable. The prose is great, responds well to OOC hints and author's notes, does a good job at details (although it occasionally messes up situation/place/pose)

config-v1b

Downloads last month: 588

GGUF

Model size

24B params

Architecture

llama

Hardware compatibility

2-bit

3-bit

4-bit

5-bit

6-bit

8-bit

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for TheDrummer/Precog-24B-v1-GGUF

Base model

mistralai/Mistral-Small-3.1-24B-Base-2503

Finetuned

mistralai/Mistral-Small-3.2-24B-Instruct-2506

Finetuned

mistralai/Magistral-Small-2509

Quantized

(25)

this model

TheDrummer
/

Precog-24B-v1-GGUF