How to use from
Docker Model Runner
# Gated model: Login with a HF token with gated access permission
hf auth login
docker model run hf.co/robertolofaro/books-model:
Quick Links

You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

Books Q&A and Recommendation Model

DOI: 10.57967/hf/8832
Demo Space: robertolofaro/books (CPU-only, currently private / testing)
Author: Roberto Lofaro
License: CC BY-SA 4.0


Model Overview

This is a GGUF quantisation of Qwen/Qwen3.5-4B, fine-tuned via a structured system prompt and optional retrieval layer to serve as a Q&A and recommendation assistant over a corpus of 12 mini-books and supplementary material by Roberto Lofaro.

The model is designed to answer questions about the mini-books and, primarily, to act as a recommendation system: given a user query or area of interest, it suggests which mini-book(s) are most relevant โ€” all of which are freely available online.


Intended Use

Use Supported
Interactive Q&A on the 12 mini-books โœ…
Book recommendation based on user interest โœ…
Offline / local inference (CPU) โœ…
General-purpose assistant โš ๏ธ Not the primary intent
Commercial deployment without attribution โŒ (see license)

Primary Task

Given a natural language request such as:

"in which books privacy is discussed?"

the model returns a ranked recommendation with a brief rationale for each suggestion, pointing the user to the relevant freely available resource.

As an example, with the request above, using the script samples_hf/qa_markdown_faiss_hnsw_externalized.py, this is the answer:

Executive Summary: Based on the provided list of 12 mini-books, privacy is explicitly discussed in the book titled "The business side of GDPR," which covers the General Data Protection Regulation and its implementation. Additionally, the book "From #dataleaks to #consentdata" addresses privacy-related topics in the context of data breaches and consent.

Guidelines & Hints:

  • The business side of GDPR: This book is not legal advice and requires consultation with a DPO or legal advisors. It focuses on the business side of GDPR compliance, including legacy system upgrades, risk assessment, and building in-house expertise. It targets a business audience interested in assessing risks and identifying frameworks.
  • From #dataleaks to #consentdata: This book discusses the transition from data leaks to consent-based data management.
  • Note on other books: While other books like "Change," "Business Social Networking," and "Strumenti per la comunicazione e promozione di idee" deal with organizational change and management, they do not explicitly focus on data privacy in the specific context of GDPR or data leakage as defined in the sources above.

Source Recommendation: Recommended Source: The business side of GDPR (Book 8) and From #dataleaks to #consentdata (Book 11).


About the Mini-Books

The 12 mini-books cover topics spanning organisational change, business transformation, knowledge management, AI adoption, and programme management, drawing on the author's 35+ years of experience in consulting and C-level advisory roles across European industrial and public-sector clients.

All mini-books are freely accessible online at robertolofaro.com and associated Patreon / GitHub publications.

You can search the content by tag cloud

A presentation card with links for each book (with links and other material on change) is available on robertolofaro.com


Available Quantisations

Quantisation File Size Recommended For
Q4_K_M books-Q4_K_M.gguf ~2.71 GB CPU inference, everyday use
Q8_0 books-Q8_0.gguf ~4.48 GB Higher fidelity, 8 GB+ RAM
BF16 books-BF16.gguf ~8.42 GB Full precision, GPU preferred

The Q4_K_M variant is recommended for CPU-only environments and is the one used in the companion Space.


Usage

Quick Start with Ollama

ollama run hf.co/robertolofaro/books-model:Q4_K_M

The file samples_hf/qa_common.py contains the "system prompt" used within the tests documented and the script samples provided.

The faiss_hnsw and qdrant files are provided for RAG use, as well as the LoRA by itself

Quick Start with llama.cpp

The pre-compiled llama.cpp with the version supporting Qwen3.5 is shared within the model repository (has been built offline), and has been tested offline with Python 3.12.3 under Ubuntu 24.04, and online with Python 3.13 within a HuggingFace space.

# macOS / Linux
brew install llama.cpp
llama-server -hf robertolofaro/books-model:Q4_K_M

# Windows (WinGet)
winget install llama.cpp
llama-server -hf robertolofaro/books-model:Q4_K_M

Quick Start with llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
    repo_id="robertolofaro/books-model",
    filename="books-Q4_K_M.gguf",
    n_ctx=4096,
)

response = llm.create_chat_completion(
    messages=[
        {
            "role": "user",
            "content": "I am interested in AI adoption in traditional industries. Which mini-books would you recommend?"
        }
    ]
)
print(response["choices"][0]["message"]["content"])

Retrieval-Augmented Variants (samples_hf/)

The repository includes a samples_hf/ folder with three reference implementations that demonstrate different retrieval strategies. The system prompt alone already yields good recommendations; the embedding-based variants add precision for longer or more ambiguous queries.

Mode A โ€” System Prompt Only (no embeddings)

File: samples_hf/run_no_embeddings.py

Fastest option. Relies entirely on the structured system prompt which encodes descriptions and themes of all 12 mini-books. No vector index required; runs on any machine with llama-cpp-python installed.

python samples_hf/run_no_embeddings.py \
  --query "Which books deal with post-merger integration?"

Mode B โ€” FAISS-HNSW Index

File: samples_hf/run_faiss_hnsw.py

Uses a pre-built FAISS index (HNSW graph) over sentence-transformer embeddings of book summaries and chapters. Suitable for environments where FAISS is available and a persistent index is desirable.

# First-run: builds the index (saved locally)
python samples_hf/run_faiss_hnsw.py --build-index

# Subsequent runs: loads existing index
python samples_hf/run_faiss_hnsw.py \
  --query "Knowledge management and organisational memory"

Mode C โ€” Qdrant Vector Store

File: samples_hf/run_qdrant.py

Uses a local Qdrant instance (or Qdrant Cloud) as the vector store. Preferred for production-style deployments or when you want persistence, filtering, and collection management.

# Start Qdrant locally (Docker)
docker run -p 6333:6333 qdrant/qdrant

# Upsert embeddings and query
python samples_hf/run_qdrant.py \
  --query "Programme management under uncertainty"

Sample Execution Output

samples_hf/ also contains a pre-run execution results example showing expected model output for a representative set of queries, useful for calibrating expectations before running inference locally.


System Prompt Design

The model is configured with a structured system prompt that:

  • Lists all 12 mini-books with title, key themes, and target audience
  • Instructs the model to reason about relevance before responding
  • Formats recommendations as a ranked list with a one-paragraph rationale per book
  • Directs the user to the free online access point for each suggestion

The system prompt is included in all three samples_hf/ scripts and can be adapted independently of the quantisation used.


Companion Space

A Gradio-based interactive demo is available at:

๐Ÿ”— robertolofaro/books

The Space runs the Q4_K_M quantisation on CPU hardware (no GPU required). It is currently private / under testing and will be made public once validated.


Limitations

  • Recommendations are bounded by the 12 mini-books in the corpus; the model will not recommend external works.
  • The model does not have live internet access; content reflects the corpus as indexed at build time.
  • CPU inference with Q4_K_M typically yields response times of 15โ€“60 seconds depending on hardware; Q8_0 / BF16 benefit from GPU acceleration.
  • The BF16 variant may exhibit minor hallucinations on very specific factual queries about book content; Q4_K_M is slightly more conservative.

Ethical Considerations

  • The corpus consists entirely of original works by the author; no third-party copyrighted content is embedded.
  • The recommendation system is informational; it does not collect user data.
  • The model inherits any biases present in the Qwen3.5-4B base model; users should apply standard critical judgement to outputs.

Citation

If you use this model or the associated scripts in research or derivative work, please cite:

@misc{lofaro2025booksmodel,
  author       = {Roberto Lofaro},
  title        = {Books Q\&A and Recommendation Model},
  year         = {2025},
  doi          = {10.57967/hf/8832},
  url          = {https://huggingface.co/robertolofaro/books-model},
  note         = {GGUF quantisation of Qwen3.5-4B, fine-tuned for book recommendation via structured system prompt and optional retrieval (FAISS-HNSW / Qdrant)}
}

License

This model card and associated scripts are released under CC BY-SA 4.0.
The base model weights are subject to the Qwen3 License.


Published openly as part of Roberto Lofaro's AI-assisted knowledge production initiative.
GitHub ยท Patreon ยท robertolofaro.com

Downloads last month
49
GGUF
Model size
4B params
Architecture
qwen35
Hardware compatibility
Log In to add your hardware

4-bit

8-bit

16-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for robertolofaro/books-model

Finetuned
Qwen/Qwen3.5-4B
Quantized
(189)
this model

Space using robertolofaro/books-model 1