Instructions to use PipableAI/pip-sql-1.3b with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use PipableAI/pip-sql-1.3b with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="PipableAI/pip-sql-1.3b")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("PipableAI/pip-sql-1.3b")
model = AutoModelForCausalLM.from_pretrained("PipableAI/pip-sql-1.3b")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use PipableAI/pip-sql-1.3b with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "PipableAI/pip-sql-1.3b"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "PipableAI/pip-sql-1.3b",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/PipableAI/pip-sql-1.3b

SGLang

How to use PipableAI/pip-sql-1.3b with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "PipableAI/pip-sql-1.3b" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "PipableAI/pip-sql-1.3b",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "PipableAI/pip-sql-1.3b" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "PipableAI/pip-sql-1.3b",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use PipableAI/pip-sql-1.3b with Docker Model Runner:
```
docker model run hf.co/PipableAI/pip-sql-1.3b
```

tokenizer.model?

by jlinux - opened Feb 18, 2024

Discussion

jlinux

Feb 18, 2024

•

edited Feb 19, 2024

I see the tokenizer files are not the same as what usually llama.cpp can convert. Is there any plans to support llama.cpp with a gguf version?

QagentS

Pipable Inc org Feb 19, 2024

It's a llama tokenizer , standard llama tokenizer defaults to fast tokenizer I think.
There were some consistency issues I was facing with fast tokenizer , so had defaulted it to not fast.

jlinux

Feb 19, 2024

Hmm. converting with llama.cpp's convert.py it complains about the vocab size being 32022 instead of 32256. When I change config.json to 32022 it converts but cannot load it. Wanted to give you a heads up and any insight anyone can provide is appreciated.

QagentS

Pipable Inc org Feb 19, 2024

Can you try going into your llama model directly and editing the params.json "vocab_size" to be 32022 ?
There is a room for mismatch in model's vocab size and tokenizer's vocab size.

Nintendo24

Feb 19, 2024

jlinux

Feb 19, 2024

•

edited Feb 19, 2024

I did not find a params.json in the repo.. I added one but it appears not to make a difference. I changed the config.json and when loading the llama.cpp server it gives the following error:

llama_model_loader: - type f32: 219 tensors
llama_model_load: error loading model: unordered_map::at
llama_load_model_from_file: failed to load model
llama_init_from_gpt_params: error: failed to load model './PipableAI/pipSQL-1.3b/ggml-model-f32.gguf'
{"timestamp":1708311357,"level":"ERROR","function":"load_model","line":377,"message":"unable to load model","model":"./PipableAI/pipSQL-1.3b/ggml-model-f32.gguf"}
terminate called without an active exception
Aborted

Nintendo24

Feb 19, 2024

Give us a day will debug and update this.
Thank you so much for pointing us to it.

jlinux

Feb 19, 2024

Apologies for spinning cycles, I was stepping over the own feet. I generated the tokenizer.model from the working pytorch python code which compounded issue that I was using huggingface vocab instead of BPE which produced the errors above. Using bpe solved the issue and successfully generated the GGUF.

Like this model a lot, appreciate everyone's hand in its success.

jlinux changed discussion status to closed Feb 19, 2024

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment