Instructions to use InferenceIllusionist/Excalibur-7b-DPO-iMat-GGUF with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use InferenceIllusionist/Excalibur-7b-DPO-iMat-GGUF with Transformers:

# Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("InferenceIllusionist/Excalibur-7b-DPO-iMat-GGUF", dtype="auto")

llama-cpp-python

How to use InferenceIllusionist/Excalibur-7b-DPO-iMat-GGUF with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="InferenceIllusionist/Excalibur-7b-DPO-iMat-GGUF",
	filename="Excalibur-7b-DPO-iMat-IQ2_M.gguf",
)

output = llm(
	"Once upon a time,",
	max_tokens=512,
	echo=True
)
print(output)

Notebooks
Google Colab
Kaggle
Local Apps

llama.cpp

How to use InferenceIllusionist/Excalibur-7b-DPO-iMat-GGUF with llama.cpp:

Install from brew

brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf InferenceIllusionist/Excalibur-7b-DPO-iMat-GGUF:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf InferenceIllusionist/Excalibur-7b-DPO-iMat-GGUF:Q4_K_M

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf InferenceIllusionist/Excalibur-7b-DPO-iMat-GGUF:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf InferenceIllusionist/Excalibur-7b-DPO-iMat-GGUF:Q4_K_M

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf InferenceIllusionist/Excalibur-7b-DPO-iMat-GGUF:Q4_K_M
# Run inference directly in the terminal:
./llama-cli -hf InferenceIllusionist/Excalibur-7b-DPO-iMat-GGUF:Q4_K_M

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf InferenceIllusionist/Excalibur-7b-DPO-iMat-GGUF:Q4_K_M
# Run inference directly in the terminal:
./build/bin/llama-cli -hf InferenceIllusionist/Excalibur-7b-DPO-iMat-GGUF:Q4_K_M

Use Docker

docker model run hf.co/InferenceIllusionist/Excalibur-7b-DPO-iMat-GGUF:Q4_K_M

LM Studio
Jan
Ollama
How to use InferenceIllusionist/Excalibur-7b-DPO-iMat-GGUF with Ollama:
```
ollama run hf.co/InferenceIllusionist/Excalibur-7b-DPO-iMat-GGUF:Q4_K_M
```

Unsloth Studio new

How to use InferenceIllusionist/Excalibur-7b-DPO-iMat-GGUF with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for InferenceIllusionist/Excalibur-7b-DPO-iMat-GGUF to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for InferenceIllusionist/Excalibur-7b-DPO-iMat-GGUF to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for InferenceIllusionist/Excalibur-7b-DPO-iMat-GGUF to start chatting

Docker Model Runner
How to use InferenceIllusionist/Excalibur-7b-DPO-iMat-GGUF with Docker Model Runner:
```
docker model run hf.co/InferenceIllusionist/Excalibur-7b-DPO-iMat-GGUF:Q4_K_M
```

Lemonade

How to use InferenceIllusionist/Excalibur-7b-DPO-iMat-GGUF with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull InferenceIllusionist/Excalibur-7b-DPO-iMat-GGUF:Q4_K_M

Run and chat with the model

lemonade run user.Excalibur-7b-DPO-iMat-GGUF-Q4_K_M

List all available models

lemonade list

Excalibur-7b-DPO-iMat-GGUF

Quantized from fp32 with love.

iMatrix .dat file was calculated using groups_merged.txt.

FP16 available here

An initial foray into the world of fine-tuning. The goal of this release was to amplify the quality of the original model's responses, in particular for vision use cases*

Notes & Methodology

Excalibur-7b fine-tuned with Direct Preference Optimization (DPO) using Intel/orca_dpo_pairs
This is a quick experiment to determine the impact of DPO finetuning on the original base model
Ran for a little over an hour on a single A100
Internal benchmarks showed improvement over base model, awaiting final results
Precision: bfloat16

Sample Question - Vision

*Requires additional mmproj file. You have two options for vision functionality (available inside original repo or linked below):

Select the gguf file of your choice in Kobold as usual, then make sure to choose the mmproj file above in the LLaVA mmproj field of the model submenu:

Prompt Format

For best results please use ChatML for the prompt format. Alpaca may also work.

Open LLM Leaderboard Evaluation Results

Detailed results can be found here

Metric	Value
Avg.	73.84
AI2 Reasoning Challenge (25-Shot)	70.90
HellaSwag (10-Shot)	87.93
MMLU (5-Shot)	65.46
TruthfulQA (0-shot)	70.82
Winogrande (5-shot)	82.48
GSM8k (5-shot)	65.43

Downloads last month: 175

GGUF

Model size

7B params

Architecture

llama

Hardware compatibility

2-bit

3-bit

4-bit

5-bit

6-bit

8-bit

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for InferenceIllusionist/Excalibur-7b-DPO-iMat-GGUF

Base model

InferenceIllusionist/Excalibur-7b

Finetuned

InferenceIllusionist/Excalibur-7b-DPO

Quantized

(6)

this model

Dataset used to train InferenceIllusionist/Excalibur-7b-DPO-iMat-GGUF

Collection including InferenceIllusionist/Excalibur-7b-DPO-iMat-GGUF

GGUFs

Collection

I take requests, feel free to drop me a line in the community posts • 31 items • Updated 22 days ago • 3