Instructions to use robertolofaro/books-model with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- llama-cpp-python
How to use robertolofaro/books-model with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="robertolofaro/books-model", filename="books-BF16.gguf", )
llm.create_chat_completion( messages = [ { "role": "user", "content": "What is the capital of France?" } ] ) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- llama.cpp
How to use robertolofaro/books-model with llama.cpp:
Install from brew
brew install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf robertolofaro/books-model:Q4_K_M # Run inference directly in the terminal: llama-cli -hf robertolofaro/books-model:Q4_K_M
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf robertolofaro/books-model:Q4_K_M # Run inference directly in the terminal: llama-cli -hf robertolofaro/books-model:Q4_K_M
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf robertolofaro/books-model:Q4_K_M # Run inference directly in the terminal: ./llama-cli -hf robertolofaro/books-model:Q4_K_M
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf robertolofaro/books-model:Q4_K_M # Run inference directly in the terminal: ./build/bin/llama-cli -hf robertolofaro/books-model:Q4_K_M
Use Docker
docker model run hf.co/robertolofaro/books-model:Q4_K_M
- LM Studio
- Jan
- vLLM
How to use robertolofaro/books-model with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "robertolofaro/books-model" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "robertolofaro/books-model", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/robertolofaro/books-model:Q4_K_M
- Ollama
How to use robertolofaro/books-model with Ollama:
ollama run hf.co/robertolofaro/books-model:Q4_K_M
- Unsloth Studio new
How to use robertolofaro/books-model with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for robertolofaro/books-model to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for robertolofaro/books-model to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for robertolofaro/books-model to start chatting
- Pi new
How to use robertolofaro/books-model with Pi:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf robertolofaro/books-model:Q4_K_M
Configure the model in Pi
# Install Pi: npm install -g @mariozechner/pi-coding-agent # Add to ~/.pi/agent/models.json: { "providers": { "llama-cpp": { "baseUrl": "http://localhost:8080/v1", "api": "openai-completions", "apiKey": "none", "models": [ { "id": "robertolofaro/books-model:Q4_K_M" } ] } } }Run Pi
# Start Pi in your project directory: pi
- Hermes Agent new
How to use robertolofaro/books-model with Hermes Agent:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf robertolofaro/books-model:Q4_K_M
Configure Hermes
# Install Hermes: curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash hermes setup # Point Hermes at the local server: hermes config set model.provider custom hermes config set model.base_url http://127.0.0.1:8080/v1 hermes config set model.default robertolofaro/books-model:Q4_K_M
Run Hermes
hermes
- Docker Model Runner
How to use robertolofaro/books-model with Docker Model Runner:
docker model run hf.co/robertolofaro/books-model:Q4_K_M
- Lemonade
How to use robertolofaro/books-model with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull robertolofaro/books-model:Q4_K_M
Run and chat with the model
lemonade run user.books-model-Q4_K_M
List all available models
lemonade list
Books Q&A and Recommendation Model
DOI: 10.57967/hf/8832
Demo Space: robertolofaro/books (CPU-only, currently private / testing)
Author: Roberto Lofaro
License: CC BY-SA 4.0
Model Overview
This is a GGUF quantisation of Qwen/Qwen3.5-4B, fine-tuned via a structured system prompt and optional retrieval layer to serve as a Q&A and recommendation assistant over a corpus of 12 mini-books and supplementary material by Roberto Lofaro.
The model is designed to answer questions about the mini-books and, primarily, to act as a recommendation system: given a user query or area of interest, it suggests which mini-book(s) are most relevant โ all of which are freely available online.
Intended Use
| Use | Supported |
|---|---|
| Interactive Q&A on the 12 mini-books | โ |
| Book recommendation based on user interest | โ |
| Offline / local inference (CPU) | โ |
| General-purpose assistant | โ ๏ธ Not the primary intent |
| Commercial deployment without attribution | โ (see license) |
Primary Task
Given a natural language request such as:
"in which books privacy is discussed?"
the model returns a ranked recommendation with a brief rationale for each suggestion, pointing the user to the relevant freely available resource.
As an example, with the request above, using the script samples_hf/qa_markdown_faiss_hnsw_externalized.py, this is the answer:
Executive Summary: Based on the provided list of 12 mini-books, privacy is explicitly discussed in the book titled "The business side of GDPR," which covers the General Data Protection Regulation and its implementation. Additionally, the book "From #dataleaks to #consentdata" addresses privacy-related topics in the context of data breaches and consent.
Guidelines & Hints:
- The business side of GDPR: This book is not legal advice and requires consultation with a DPO or legal advisors. It focuses on the business side of GDPR compliance, including legacy system upgrades, risk assessment, and building in-house expertise. It targets a business audience interested in assessing risks and identifying frameworks.
- From #dataleaks to #consentdata: This book discusses the transition from data leaks to consent-based data management.
- Note on other books: While other books like "Change," "Business Social Networking," and "Strumenti per la comunicazione e promozione di idee" deal with organizational change and management, they do not explicitly focus on data privacy in the specific context of GDPR or data leakage as defined in the sources above.
Source Recommendation: Recommended Source: The business side of GDPR (Book 8) and From #dataleaks to #consentdata (Book 11).
About the Mini-Books
The 12 mini-books cover topics spanning organisational change, business transformation, knowledge management, AI adoption, and programme management, drawing on the author's 35+ years of experience in consulting and C-level advisory roles across European industrial and public-sector clients.
All mini-books are freely accessible online at robertolofaro.com and associated Patreon / GitHub publications.
You can search the content by tag cloud
A presentation card with links for each book (with links and other material on change) is available on robertolofaro.com
Available Quantisations
| Quantisation | File | Size | Recommended For |
|---|---|---|---|
| Q4_K_M | books-Q4_K_M.gguf |
~2.71 GB | CPU inference, everyday use |
| Q8_0 | books-Q8_0.gguf |
~4.48 GB | Higher fidelity, 8 GB+ RAM |
| BF16 | books-BF16.gguf |
~8.42 GB | Full precision, GPU preferred |
The Q4_K_M variant is recommended for CPU-only environments and is the one used in the companion Space.
Usage
Quick Start with Ollama
ollama run hf.co/robertolofaro/books-model:Q4_K_M
The file samples_hf/qa_common.py contains the "system prompt" used within the tests documented and the script samples provided.
The faiss_hnsw and qdrant files are provided for RAG use, as well as the LoRA by itself
Quick Start with llama.cpp
The pre-compiled llama.cpp with the version supporting Qwen3.5 is shared within the model repository (has been built offline), and has been tested offline with Python 3.12.3 under Ubuntu 24.04, and online with Python 3.13 within a HuggingFace space.
# macOS / Linux
brew install llama.cpp
llama-server -hf robertolofaro/books-model:Q4_K_M
# Windows (WinGet)
winget install llama.cpp
llama-server -hf robertolofaro/books-model:Q4_K_M
Quick Start with llama-cpp-python
from llama_cpp import Llama
llm = Llama.from_pretrained(
repo_id="robertolofaro/books-model",
filename="books-Q4_K_M.gguf",
n_ctx=4096,
)
response = llm.create_chat_completion(
messages=[
{
"role": "user",
"content": "I am interested in AI adoption in traditional industries. Which mini-books would you recommend?"
}
]
)
print(response["choices"][0]["message"]["content"])
Retrieval-Augmented Variants (samples_hf/)
The repository includes a samples_hf/ folder with three reference implementations that demonstrate different retrieval strategies. The system prompt alone already yields good recommendations; the embedding-based variants add precision for longer or more ambiguous queries.
Mode A โ System Prompt Only (no embeddings)
File: samples_hf/run_no_embeddings.py
Fastest option. Relies entirely on the structured system prompt which encodes descriptions and themes of all 12 mini-books. No vector index required; runs on any machine with llama-cpp-python installed.
python samples_hf/run_no_embeddings.py \
--query "Which books deal with post-merger integration?"
Mode B โ FAISS-HNSW Index
File: samples_hf/run_faiss_hnsw.py
Uses a pre-built FAISS index (HNSW graph) over sentence-transformer embeddings of book summaries and chapters. Suitable for environments where FAISS is available and a persistent index is desirable.
# First-run: builds the index (saved locally)
python samples_hf/run_faiss_hnsw.py --build-index
# Subsequent runs: loads existing index
python samples_hf/run_faiss_hnsw.py \
--query "Knowledge management and organisational memory"
Mode C โ Qdrant Vector Store
File: samples_hf/run_qdrant.py
Uses a local Qdrant instance (or Qdrant Cloud) as the vector store. Preferred for production-style deployments or when you want persistence, filtering, and collection management.
# Start Qdrant locally (Docker)
docker run -p 6333:6333 qdrant/qdrant
# Upsert embeddings and query
python samples_hf/run_qdrant.py \
--query "Programme management under uncertainty"
Sample Execution Output
samples_hf/ also contains a pre-run execution results example showing expected model output for a representative set of queries, useful for calibrating expectations before running inference locally.
System Prompt Design
The model is configured with a structured system prompt that:
- Lists all 12 mini-books with title, key themes, and target audience
- Instructs the model to reason about relevance before responding
- Formats recommendations as a ranked list with a one-paragraph rationale per book
- Directs the user to the free online access point for each suggestion
The system prompt is included in all three samples_hf/ scripts and can be adapted independently of the quantisation used.
Companion Space
A Gradio-based interactive demo is available at:
๐ robertolofaro/books
The Space runs the Q4_K_M quantisation on CPU hardware (no GPU required). It is currently private / under testing and will be made public once validated.
Limitations
- Recommendations are bounded by the 12 mini-books in the corpus; the model will not recommend external works.
- The model does not have live internet access; content reflects the corpus as indexed at build time.
- CPU inference with Q4_K_M typically yields response times of 15โ60 seconds depending on hardware; Q8_0 / BF16 benefit from GPU acceleration.
- The BF16 variant may exhibit minor hallucinations on very specific factual queries about book content; Q4_K_M is slightly more conservative.
Ethical Considerations
- The corpus consists entirely of original works by the author; no third-party copyrighted content is embedded.
- The recommendation system is informational; it does not collect user data.
- The model inherits any biases present in the Qwen3.5-4B base model; users should apply standard critical judgement to outputs.
Citation
If you use this model or the associated scripts in research or derivative work, please cite:
@misc{lofaro2025booksmodel,
author = {Roberto Lofaro},
title = {Books Q\&A and Recommendation Model},
year = {2025},
doi = {10.57967/hf/8832},
url = {https://huggingface.co/robertolofaro/books-model},
note = {GGUF quantisation of Qwen3.5-4B, fine-tuned for book recommendation via structured system prompt and optional retrieval (FAISS-HNSW / Qdrant)}
}
License
This model card and associated scripts are released under CC BY-SA 4.0.
The base model weights are subject to the Qwen3 License.
Published openly as part of Roberto Lofaro's AI-assisted knowledge production initiative.
GitHub ยท Patreon ยท robertolofaro.com
- Downloads last month
- 49
4-bit
8-bit
16-bit