Instructions to use gpustack/jina-reranker-v1-tiny-en-GGUF with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use gpustack/jina-reranker-v1-tiny-en-GGUF with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-classification", model="gpustack/jina-reranker-v1-tiny-en-GGUF")# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("gpustack/jina-reranker-v1-tiny-en-GGUF", dtype="auto") - Transformers.js
How to use gpustack/jina-reranker-v1-tiny-en-GGUF with Transformers.js:
// npm i @huggingface/transformers import { pipeline } from '@huggingface/transformers'; // Allocate pipeline const pipe = await pipeline('text-classification', 'gpustack/jina-reranker-v1-tiny-en-GGUF'); - llama-cpp-python
How to use gpustack/jina-reranker-v1-tiny-en-GGUF with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="gpustack/jina-reranker-v1-tiny-en-GGUF", filename="jina-reranker-v1-tiny-en-FP16.gguf", )
output = llm( "Once upon a time,", max_tokens=512, echo=True ) print(output)
- Notebooks
- Google Colab
- Kaggle
- Local Apps
- llama.cpp
How to use gpustack/jina-reranker-v1-tiny-en-GGUF with llama.cpp:
Install from brew
brew install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf gpustack/jina-reranker-v1-tiny-en-GGUF:Q4_K_M # Run inference directly in the terminal: llama-cli -hf gpustack/jina-reranker-v1-tiny-en-GGUF:Q4_K_M
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf gpustack/jina-reranker-v1-tiny-en-GGUF:Q4_K_M # Run inference directly in the terminal: llama-cli -hf gpustack/jina-reranker-v1-tiny-en-GGUF:Q4_K_M
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf gpustack/jina-reranker-v1-tiny-en-GGUF:Q4_K_M # Run inference directly in the terminal: ./llama-cli -hf gpustack/jina-reranker-v1-tiny-en-GGUF:Q4_K_M
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf gpustack/jina-reranker-v1-tiny-en-GGUF:Q4_K_M # Run inference directly in the terminal: ./build/bin/llama-cli -hf gpustack/jina-reranker-v1-tiny-en-GGUF:Q4_K_M
Use Docker
docker model run hf.co/gpustack/jina-reranker-v1-tiny-en-GGUF:Q4_K_M
- LM Studio
- Jan
- Ollama
How to use gpustack/jina-reranker-v1-tiny-en-GGUF with Ollama:
ollama run hf.co/gpustack/jina-reranker-v1-tiny-en-GGUF:Q4_K_M
- Unsloth Studio new
How to use gpustack/jina-reranker-v1-tiny-en-GGUF with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for gpustack/jina-reranker-v1-tiny-en-GGUF to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for gpustack/jina-reranker-v1-tiny-en-GGUF to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for gpustack/jina-reranker-v1-tiny-en-GGUF to start chatting
- Docker Model Runner
How to use gpustack/jina-reranker-v1-tiny-en-GGUF with Docker Model Runner:
docker model run hf.co/gpustack/jina-reranker-v1-tiny-en-GGUF:Q4_K_M
- Lemonade
How to use gpustack/jina-reranker-v1-tiny-en-GGUF with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull gpustack/jina-reranker-v1-tiny-en-GGUF:Q4_K_M
Run and chat with the model
lemonade run user.jina-reranker-v1-tiny-en-GGUF-Q4_K_M
List all available models
lemonade list
output = llm(
"Once upon a time,",
max_tokens=512,
echo=True
)
print(output)jina-reranker-v1-tiny-en-GGUF
Model creator: Jina AI
Original model: jina-reranker-v1-tiny-en
GGUF quantization: based on llama.cpp release f4d2b
Trained by Jina AI.
jina-reranker-v1-tiny-en
This model is designed for blazing-fast reranking while maintaining competitive performance. What's more, it leverages the power of our JinaBERT model as its foundation. JinaBERT itself is a unique variant of the BERT architecture that supports the symmetric bidirectional variant of ALiBi. This allows jina-reranker-v1-tiny-en to process significantly longer sequences of text compared to other reranking models, up to an impressive 8,192 tokens.
To achieve the remarkable speed, the jina-reranker-v1-tiny-en employ a technique called knowledge distillation. Here, a complex, but slower, model (like our original jina-reranker-v1-base-en) acts as a teacher, condensing its knowledge into a smaller, faster student model. This student retains most of the teacher's knowledge, allowing it to deliver similar accuracy in a fraction of the time.
Here's a breakdown of the reranker models we provide:
| Model Name | Layers | Hidden Size | Parameters (Millions) |
|---|---|---|---|
| jina-reranker-v1-base-en | 12 | 768 | 137.0 |
| jina-reranker-v1-turbo-en | 6 | 384 | 37.8 |
| jina-reranker-v1-tiny-en | 4 | 384 | 33.0 |
Currently, the
jina-reranker-v1-base-enmodel is not available on Hugging Face. You can access it via the Jina AI Reranker API.
As you can see, the jina-reranker-v1-turbo-en offers a balanced approach with 6 layers and 37.8 million parameters. This translates to fast search and reranking while preserving a high degree of accuracy. The jina-reranker-v1-tiny-en prioritizes speed even further, achieving the fastest inference speeds with its 4-layer, 33.0 million parameter architecture. This makes it ideal for scenarios where absolute top accuracy is less crucial.
Usage
- The easiest way to starting using
jina-reranker-v1-tiny-enis to use Jina AI's Reranker API.
curl https://api.jina.ai/v1/rerank \
-H "Content-Type: application/json" \
-H "Authorization: Bearer YOUR_API_KEY" \
-d '{
"model": "jina-reranker-v1-tiny-en",
"query": "Organic skincare products for sensitive skin",
"documents": [
"Eco-friendly kitchenware for modern homes",
"Biodegradable cleaning supplies for eco-conscious consumers",
"Organic cotton baby clothes for sensitive skin",
"Natural organic skincare range for sensitive skin",
"Tech gadgets for smart homes: 2024 edition",
"Sustainable gardening tools and compost solutions",
"Sensitive skin-friendly facial cleansers and toners",
"Organic food wraps and storage solutions",
"All-natural pet food for dogs with allergies",
"Yoga mats made from recycled materials"
],
"top_n": 3
}'
- Alternatively, you can use the latest version of the
sentence-transformers>=0.27.0library. You can install it via pip:
pip install -U sentence-transformers
Then, you can use the following code to interact with the model:
from sentence_transformers import CrossEncoder
# Load the model, here we use our tiny sized model
model = CrossEncoder("jinaai/jina-reranker-v1-tiny-en", trust_remote_code=True)
# Example query and documents
query = "Organic skincare products for sensitive skin"
documents = [
"Eco-friendly kitchenware for modern homes",
"Biodegradable cleaning supplies for eco-conscious consumers",
"Organic cotton baby clothes for sensitive skin",
"Natural organic skincare range for sensitive skin",
"Tech gadgets for smart homes: 2024 edition",
"Sustainable gardening tools and compost solutions",
"Sensitive skin-friendly facial cleansers and toners",
"Organic food wraps and storage solutions",
"All-natural pet food for dogs with allergies",
"Yoga mats made from recycled materials"
]
results = model.rank(query, documents, return_documents=True, top_k=3)
- You can also use the
transformerslibrary to interact with the model programmatically.
!pip install transformers
from transformers import AutoModelForSequenceClassification
model = AutoModelForSequenceClassification.from_pretrained(
'jinaai/jina-reranker-v1-tiny-en', num_labels=1, trust_remote_code=True
)
# Example query and documents
query = "Organic skincare products for sensitive skin"
documents = [
"Eco-friendly kitchenware for modern homes",
"Biodegradable cleaning supplies for eco-conscious consumers",
"Organic cotton baby clothes for sensitive skin",
"Natural organic skincare range for sensitive skin",
"Tech gadgets for smart homes: 2024 edition",
"Sustainable gardening tools and compost solutions",
"Sensitive skin-friendly facial cleansers and toners",
"Organic food wraps and storage solutions",
"All-natural pet food for dogs with allergies",
"Yoga mats made from recycled materials"
]
# construct sentence pairs
sentence_pairs = [[query, doc] for doc in documents]
scores = model.compute_score(sentence_pairs)
- You can also use the
transformers.jslibrary to run the model directly in JavaScript (in-browser, Node.js, Deno, etc.)!
If you haven't already, you can install the Transformers.js JavaScript library from NPM using:
npm i @xenova/transformers
Then, you can use the following code to interact with the model:
import { AutoTokenizer, AutoModelForSequenceClassification } from '@xenova/transformers';
const model_id = 'jinaai/jina-reranker-v1-tiny-en';
const model = await AutoModelForSequenceClassification.from_pretrained(model_id, { quantized: false });
const tokenizer = await AutoTokenizer.from_pretrained(model_id);
/**
* Performs ranking with the CrossEncoder on the given query and documents. Returns a sorted list with the document indices and scores.
* @param {string} query A single query
* @param {string[]} documents A list of documents
* @param {Object} options Options for ranking
* @param {number} [options.top_k=undefined] Return the top-k documents. If undefined, all documents are returned.
* @param {number} [options.return_documents=false] If true, also returns the documents. If false, only returns the indices and scores.
*/
async function rank(query, documents, {
top_k = undefined,
return_documents = false,
} = {}) {
const inputs = tokenizer(
new Array(documents.length).fill(query),
{ text_pair: documents, padding: true, truncation: true }
)
const { logits } = await model(inputs);
return logits.sigmoid().tolist()
.map(([score], i) => ({
corpus_id: i,
score,
...(return_documents ? { text: documents[i] } : {})
})).sort((a, b) => b.score - a.score).slice(0, top_k);
}
// Example usage:
const query = "Organic skincare products for sensitive skin"
const documents = [
"Eco-friendly kitchenware for modern homes",
"Biodegradable cleaning supplies for eco-conscious consumers",
"Organic cotton baby clothes for sensitive skin",
"Natural organic skincare range for sensitive skin",
"Tech gadgets for smart homes: 2024 edition",
"Sustainable gardening tools and compost solutions",
"Sensitive skin-friendly facial cleansers and toners",
"Organic food wraps and storage solutions",
"All-natural pet food for dogs with allergies",
"Yoga mats made from recycled materials",
]
const results = await rank(query, documents, { return_documents: true, top_k: 3 });
console.log(results);
That's it! You can now use the jina-reranker-v1-tiny-en model in your projects.
Evaluation
We evaluated Jina Reranker on 3 key benchmarks to ensure top-tier performance and search relevance.
| Model Name | NDCG@10 (17 BEIR datasets) | NDCG@10 (5 LoCo datasets) | Hit Rate (LlamaIndex RAG) |
|---|---|---|---|
jina-reranker-v1-base-en |
52.45 | 87.31 | 85.53 |
jina-reranker-v1-turbo-en |
49.60 | 69.21 | 85.13 |
jina-reranker-v1-tiny-en (you are here) |
48.54 | 70.29 | 85.00 |
mxbai-rerank-base-v1 |
49.19 | - | 82.50 |
mxbai-rerank-xsmall-v1 |
48.80 | - | 83.69 |
ms-marco-MiniLM-L-6-v2 |
48.64 | - | 82.63 |
ms-marco-MiniLM-L-4-v2 |
47.81 | - | 83.82 |
bge-reranker-base |
47.89 | - | 83.03 |
Note:
NDCG@10is a measure of ranking quality, with higher scores indicating better search results.Hit Ratemeasures the percentage of relevant documents that appear in the top 10 search results.- The results of LoCo datasets on other models are not available since they do not support long documents more than 512 tokens.
For more details, please refer to our benchmarking sheets.
Contact
Join our Discord community and chat with other community members about ideas.
- Downloads last month
- 1,182
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="gpustack/jina-reranker-v1-tiny-en-GGUF", filename="", )