Instructions to use TheDrummer/Tiger-Gemma-9B-v1 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use TheDrummer/Tiger-Gemma-9B-v1 with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="TheDrummer/Tiger-Gemma-9B-v1")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("TheDrummer/Tiger-Gemma-9B-v1")
model = AutoModelForCausalLM.from_pretrained("TheDrummer/Tiger-Gemma-9B-v1")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Inference
Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use TheDrummer/Tiger-Gemma-9B-v1 with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "TheDrummer/Tiger-Gemma-9B-v1"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "TheDrummer/Tiger-Gemma-9B-v1",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/TheDrummer/Tiger-Gemma-9B-v1

SGLang

How to use TheDrummer/Tiger-Gemma-9B-v1 with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "TheDrummer/Tiger-Gemma-9B-v1" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "TheDrummer/Tiger-Gemma-9B-v1",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "TheDrummer/Tiger-Gemma-9B-v1" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "TheDrummer/Tiger-Gemma-9B-v1",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use TheDrummer/Tiger-Gemma-9B-v1 with Docker Model Runner:
```
docker model run hf.co/TheDrummer/Tiger-Gemma-9B-v1
```

Regression vs original

by MoonRide - opened Jul 16, 2024

Discussion

MoonRide

Jul 16, 2024

Regression I've noticed vs original gemma during initial tests (original model didn't fail). It happens like once or twice per 10 attempts, like that:

Launched using llama-server.exe -v -ngl 99 -m gemma-2-9b-it-Q6_K.gguf, setup as below:

TheDrummer

Owner Jul 16, 2024

https://huggingface.co/BeaverAI?search_models=tiger-gemma-9b-v2

v2a might have the same issue as v1, but the other versions should fare better.

MoonRide

Jul 16, 2024

•

edited Jul 16, 2024

@TheDrummer Okay, will check those out. Btw. I just started playing with Big Tiger v1 (Big-Tiger-Gemma-27B-v1-IQ4_XS.gguf from https://huggingface.co/bartowski/Big-Tiger-Gemma-27B-v1-GGUF), and I see same problem there (while same quant from original always gives correct answer).

UPDATE: I tested Tiger-Gemma-9B-v2g-Q6_K.gguf from https://huggingface.co/BeaverAI/Tiger-Gemma-9B-v2g-GGUF, and it still sometimes fails. Also 2g is refusing much more alike the original version.

PS I really like the idea of uncensored models being as smart as the original, without messing them up with intense finetuning - just given ability to treat adults like adults. For L3 pretty nice approach like that was https://huggingface.co/vicgalle/Configurable-Llama-3-8B-v0.3 from @vicgalle (model learning how to follow range of system prompts) - maybe something like that could work for Gemma 2 series, too?

urtuuuu

Jul 16, 2024

•

edited Jul 16, 2024

Is it even a surprise? I don't remember any model that is based on something like llama3, gemma2 etc, and is not worse than the original. At least at reasoning...

Dihelson

Jul 16, 2024

Is it even a surprise? I don't remember any model that is based on something like llama3, gemma2 etc, and is not worse than the original. At least at reasoning...

Not even a mixture of models ? Have you tested Lunaris ?

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment