Instructions to use robertolofaro/books-model with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use robertolofaro/books-model with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="robertolofaro/books-model",
	filename="books-BF16.gguf",
)

llm.create_chat_completion(
	messages = [
		{
			"role": "user",
			"content": "What is the capital of France?"
		}
	]
)

Notebooks
Google Colab
Kaggle
Local Apps

llama.cpp

How to use robertolofaro/books-model with llama.cpp:

Install from brew

brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf robertolofaro/books-model:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf robertolofaro/books-model:Q4_K_M

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf robertolofaro/books-model:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf robertolofaro/books-model:Q4_K_M

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf robertolofaro/books-model:Q4_K_M
# Run inference directly in the terminal:
./llama-cli -hf robertolofaro/books-model:Q4_K_M

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf robertolofaro/books-model:Q4_K_M
# Run inference directly in the terminal:
./build/bin/llama-cli -hf robertolofaro/books-model:Q4_K_M

Use Docker

docker model run hf.co/robertolofaro/books-model:Q4_K_M

LM Studio
Jan

vLLM

How to use robertolofaro/books-model with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "robertolofaro/books-model"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "robertolofaro/books-model",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/robertolofaro/books-model:Q4_K_M

Ollama
How to use robertolofaro/books-model with Ollama:
```
ollama run hf.co/robertolofaro/books-model:Q4_K_M
```

Unsloth Studio new

How to use robertolofaro/books-model with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for robertolofaro/books-model to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for robertolofaro/books-model to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for robertolofaro/books-model to start chatting

Pi new

How to use robertolofaro/books-model with Pi:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf robertolofaro/books-model:Q4_K_M

Configure the model in Pi

# Install Pi:
npm install -g @mariozechner/pi-coding-agent
# Add to ~/.pi/agent/models.json:
{
  "providers": {
    "llama-cpp": {
      "baseUrl": "http://localhost:8080/v1",
      "api": "openai-completions",
      "apiKey": "none",
      "models": [
        {
          "id": "robertolofaro/books-model:Q4_K_M"
        }
      ]
    }
  }
}

Run Pi

# Start Pi in your project directory:
pi

Hermes Agent new

How to use robertolofaro/books-model with Hermes Agent:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf robertolofaro/books-model:Q4_K_M

Configure Hermes

# Install Hermes:
curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash
hermes setup
# Point Hermes at the local server:
hermes config set model.provider custom
hermes config set model.base_url http://127.0.0.1:8080/v1
hermes config set model.default robertolofaro/books-model:Q4_K_M

Run Hermes

hermes

Docker Model Runner
How to use robertolofaro/books-model with Docker Model Runner:
```
docker model run hf.co/robertolofaro/books-model:Q4_K_M
```

Lemonade

How to use robertolofaro/books-model with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull robertolofaro/books-model:Q4_K_M

Run and chat with the model

lemonade run user.books-model-Q4_K_M

List all available models

lemonade list

You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Books Q&A and Recommendation Model

DOI: 10.57967/hf/8832
Demo Space: robertolofaro/books (CPU-only, currently private / testing)
Author: Roberto Lofaro
License: CC BY-SA 4.0

Model Overview

This is a GGUF quantisation of Qwen/Qwen3.5-4B, fine-tuned via a structured system prompt and optional retrieval layer to serve as a Q&A and recommendation assistant over a corpus of 12 mini-books and supplementary material by Roberto Lofaro.

The model is designed to answer questions about the mini-books and, primarily, to act as a recommendation system: given a user query or area of interest, it suggests which mini-book(s) are most relevant — all of which are freely available online.

Intended Use

Use	Supported
Interactive Q&A on the 12 mini-books	✅
Book recommendation based on user interest	✅
Offline / local inference (CPU)	✅
General-purpose assistant	⚠️ Not the primary intent
Commercial deployment without attribution	❌ (see license)

Primary Task

Given a natural language request such as:

"in which books privacy is discussed?"

the model returns a ranked recommendation with a brief rationale for each suggestion, pointing the user to the relevant freely available resource.

As an example, with the request above, using the script samples_hf/qa_markdown_faiss_hnsw_externalized.py, this is the answer:

Executive Summary: Based on the provided list of 12 mini-books, privacy is explicitly discussed in the book titled "The business side of GDPR," which covers the General Data Protection Regulation and its implementation. Additionally, the book "From #dataleaks to #consentdata" addresses privacy-related topics in the context of data breaches and consent.

Guidelines & Hints:

The business side of GDPR: This book is not legal advice and requires consultation with a DPO or legal advisors. It focuses on the business side of GDPR compliance, including legacy system upgrades, risk assessment, and building in-house expertise. It targets a business audience interested in assessing risks and identifying frameworks.
From #dataleaks to #consentdata: This book discusses the transition from data leaks to consent-based data management.
Note on other books: While other books like "Change," "Business Social Networking," and "Strumenti per la comunicazione e promozione di idee" deal with organizational change and management, they do not explicitly focus on data privacy in the specific context of GDPR or data leakage as defined in the sources above.

Source Recommendation: Recommended Source: The business side of GDPR (Book 8) and From #dataleaks to #consentdata (Book 11).

About the Mini-Books

The 12 mini-books cover topics spanning organisational change, business transformation, knowledge management, AI adoption, and programme management, drawing on the author's 35+ years of experience in consulting and C-level advisory roles across European industrial and public-sector clients.

All mini-books are freely accessible online at robertolofaro.com and associated Patreon / GitHub publications.

You can search the content by tag cloud

A presentation card with links for each book (with links and other material on change) is available on robertolofaro.com

Available Quantisations

Quantisation	File	Size	Recommended For
Q4_K_M	`books-Q4_K_M.gguf`	~2.71 GB	CPU inference, everyday use
Q8_0	`books-Q8_0.gguf`	~4.48 GB	Higher fidelity, 8 GB+ RAM
BF16	`books-BF16.gguf`	~8.42 GB	Full precision, GPU preferred

The Q4_K_M variant is recommended for CPU-only environments and is the one used in the companion Space.

Usage

Quick Start with Ollama

ollama run hf.co/robertolofaro/books-model:Q4_K_M

The file samples_hf/qa_common.py contains the "system prompt" used within the tests documented and the script samples provided.

The faiss_hnsw and qdrant files are provided for RAG use, as well as the LoRA by itself

Quick Start with llama.cpp

The pre-compiled llama.cpp with the version supporting Qwen3.5 is shared within the model repository (has been built offline), and has been tested offline with Python 3.12.3 under Ubuntu 24.04, and online with Python 3.13 within a HuggingFace space.

# macOS / Linux
brew install llama.cpp
llama-server -hf robertolofaro/books-model:Q4_K_M

# Windows (WinGet)
winget install llama.cpp
llama-server -hf robertolofaro/books-model:Q4_K_M

Quick Start with llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
    repo_id="robertolofaro/books-model",
    filename="books-Q4_K_M.gguf",
    n_ctx=4096,
)

response = llm.create_chat_completion(
    messages=[
        {
            "role": "user",
            "content": "I am interested in AI adoption in traditional industries. Which mini-books would you recommend?"
        }
    ]
)
print(response["choices"][0]["message"]["content"])

Retrieval-Augmented Variants (`samples_hf/`)

The repository includes a samples_hf/ folder with three reference implementations that demonstrate different retrieval strategies. The system prompt alone already yields good recommendations; the embedding-based variants add precision for longer or more ambiguous queries.

Mode A — System Prompt Only (no embeddings)

File: samples_hf/run_no_embeddings.py

Fastest option. Relies entirely on the structured system prompt which encodes descriptions and themes of all 12 mini-books. No vector index required; runs on any machine with llama-cpp-python installed.

python samples_hf/run_no_embeddings.py \
  --query "Which books deal with post-merger integration?"

Mode B — FAISS-HNSW Index

File: samples_hf/run_faiss_hnsw.py

Uses a pre-built FAISS index (HNSW graph) over sentence-transformer embeddings of book summaries and chapters. Suitable for environments where FAISS is available and a persistent index is desirable.

# First-run: builds the index (saved locally)
python samples_hf/run_faiss_hnsw.py --build-index

# Subsequent runs: loads existing index
python samples_hf/run_faiss_hnsw.py \
  --query "Knowledge management and organisational memory"

Mode C — Qdrant Vector Store

File: samples_hf/run_qdrant.py

Uses a local Qdrant instance (or Qdrant Cloud) as the vector store. Preferred for production-style deployments or when you want persistence, filtering, and collection management.

# Start Qdrant locally (Docker)
docker run -p 6333:6333 qdrant/qdrant

# Upsert embeddings and query
python samples_hf/run_qdrant.py \
  --query "Programme management under uncertainty"

Sample Execution Output

samples_hf/ also contains a pre-run execution results example showing expected model output for a representative set of queries, useful for calibrating expectations before running inference locally.

System Prompt Design

The model is configured with a structured system prompt that:

Lists all 12 mini-books with title, key themes, and target audience
Instructs the model to reason about relevance before responding
Formats recommendations as a ranked list with a one-paragraph rationale per book
Directs the user to the free online access point for each suggestion

The system prompt is included in all three samples_hf/ scripts and can be adapted independently of the quantisation used.

Companion Space

A Gradio-based interactive demo is available at:

🔗 robertolofaro/books

The Space runs the Q4_K_M quantisation on CPU hardware (no GPU required). It is currently private / under testing and will be made public once validated.

Limitations

Recommendations are bounded by the 12 mini-books in the corpus; the model will not recommend external works.
The model does not have live internet access; content reflects the corpus as indexed at build time.
CPU inference with Q4_K_M typically yields response times of 15–60 seconds depending on hardware; Q8_0 / BF16 benefit from GPU acceleration.
The BF16 variant may exhibit minor hallucinations on very specific factual queries about book content; Q4_K_M is slightly more conservative.

Ethical Considerations

The corpus consists entirely of original works by the author; no third-party copyrighted content is embedded.
The recommendation system is informational; it does not collect user data.
The model inherits any biases present in the Qwen3.5-4B base model; users should apply standard critical judgement to outputs.

Citation

If you use this model or the associated scripts in research or derivative work, please cite:

@misc{lofaro2025booksmodel,
  author       = {Roberto Lofaro},
  title        = {Books Q\&A and Recommendation Model},
  year         = {2025},
  doi          = {10.57967/hf/8832},
  url          = {https://huggingface.co/robertolofaro/books-model},
  note         = {GGUF quantisation of Qwen3.5-4B, fine-tuned for book recommendation via structured system prompt and optional retrieval (FAISS-HNSW / Qdrant)}
}

License

This model card and associated scripts are released under CC BY-SA 4.0.
The base model weights are subject to the Qwen3 License.

Published openly as part of Roberto Lofaro's AI-assisted knowledge production initiative.
GitHub · Patreon · robertolofaro.com

Downloads last month: 49

GGUF

Model size

4B params

Architecture

qwen35

Hardware compatibility

4-bit

8-bit

16-bit

Model tree for robertolofaro/books-model

Base model

Qwen/Qwen3.5-4B-Base

Finetuned

Qwen/Qwen3.5-4B

Quantized

(189)

this model

robertolofaro
/

books-model

You need to agree to share your contact information to access this model

Books Q&A and Recommendation Model

Model Overview

Intended Use

Primary Task

About the Mini-Books

Available Quantisations

Usage

Quick Start with Ollama

Quick Start with llama.cpp

Quick Start with llama-cpp-python

Retrieval-Augmented Variants (`samples_hf/`)

Mode A — System Prompt Only (no embeddings)

Mode B — FAISS-HNSW Index

Mode C — Qdrant Vector Store

Sample Execution Output

System Prompt Design

Companion Space

Limitations

Ethical Considerations

Citation

License

Model tree for robertolofaro/books-model

Space using robertolofaro/books-model 1

You need to agree to share your contact information to access this model

Books Q&A and Recommendation Model

Model Overview

Intended Use

Primary Task

About the Mini-Books

Available Quantisations

Usage

Quick Start with Ollama

Quick Start with llama.cpp

Quick Start with llama-cpp-python

Retrieval-Augmented Variants (samples_hf/)

Mode A — System Prompt Only (no embeddings)

Mode B — FAISS-HNSW Index

Mode C — Qdrant Vector Store

Sample Execution Output

System Prompt Design

Companion Space

Limitations

Ethical Considerations

Citation

License

Model tree for robertolofaro/books-model

Space using robertolofaro/books-model 1

Retrieval-Augmented Variants (`samples_hf/`)