Instructions to use TheDrummer/Tiger-Gemma-9B-v1 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use TheDrummer/Tiger-Gemma-9B-v1 with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="TheDrummer/Tiger-Gemma-9B-v1") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("TheDrummer/Tiger-Gemma-9B-v1") model = AutoModelForCausalLM.from_pretrained("TheDrummer/Tiger-Gemma-9B-v1") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Inference
- Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use TheDrummer/Tiger-Gemma-9B-v1 with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "TheDrummer/Tiger-Gemma-9B-v1" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "TheDrummer/Tiger-Gemma-9B-v1", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/TheDrummer/Tiger-Gemma-9B-v1
- SGLang
How to use TheDrummer/Tiger-Gemma-9B-v1 with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "TheDrummer/Tiger-Gemma-9B-v1" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "TheDrummer/Tiger-Gemma-9B-v1", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "TheDrummer/Tiger-Gemma-9B-v1" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "TheDrummer/Tiger-Gemma-9B-v1", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use TheDrummer/Tiger-Gemma-9B-v1 with Docker Model Runner:
docker model run hf.co/TheDrummer/Tiger-Gemma-9B-v1
Regression vs original
https://huggingface.co/BeaverAI?search_models=tiger-gemma-9b-v2
v2a might have the same issue as v1, but the other versions should fare better.
@TheDrummer Okay, will check those out. Btw. I just started playing with Big Tiger v1 (Big-Tiger-Gemma-27B-v1-IQ4_XS.gguf from https://huggingface.co/bartowski/Big-Tiger-Gemma-27B-v1-GGUF), and I see same problem there (while same quant from original always gives correct answer).
UPDATE: I tested Tiger-Gemma-9B-v2g-Q6_K.gguf from https://huggingface.co/BeaverAI/Tiger-Gemma-9B-v2g-GGUF, and it still sometimes fails. Also 2g is refusing much more alike the original version.
PS I really like the idea of uncensored models being as smart as the original, without messing them up with intense finetuning - just given ability to treat adults like adults. For L3 pretty nice approach like that was https://huggingface.co/vicgalle/Configurable-Llama-3-8B-v0.3 from @vicgalle (model learning how to follow range of system prompts) - maybe something like that could work for Gemma 2 series, too?
Is it even a surprise? I don't remember any model that is based on something like llama3, gemma2 etc, and is not worse than the original. At least at reasoning...
Is it even a surprise? I don't remember any model that is based on something like llama3, gemma2 etc, and is not worse than the original. At least at reasoning...
Not even a mixture of models ? Have you tested Lunaris ?


