Instructions to use openchat/openchat_3.5 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use openchat/openchat_3.5 with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="openchat/openchat_3.5")# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("openchat/openchat_3.5", dtype="auto") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use openchat/openchat_3.5 with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "openchat/openchat_3.5" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "openchat/openchat_3.5", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/openchat/openchat_3.5
- SGLang
How to use openchat/openchat_3.5 with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "openchat/openchat_3.5" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "openchat/openchat_3.5", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "openchat/openchat_3.5" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "openchat/openchat_3.5", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use openchat/openchat_3.5 with Docker Model Runner:
docker model run hf.co/openchat/openchat_3.5
Which is the actual way to store the Adapter after PEFT finetuning
#42
by Pradeep1995 - opened
I am finetuning the mistral model using the following configurations
training_arguments = TrainingArguments(
output_dir=output_dir,
per_device_train_batch_size=per_device_train_batch_size,
gradient_accumulation_steps=gradient_accumulation_steps,
optim=optim,
save_steps=save_steps,
logging_strategy="steps",
logging_steps=10,
learning_rate=learning_rate,
weight_decay=weight_decay,
fp16=fp16,
bf16=bf16,
max_grad_norm=max_grad_norm,
max_steps=13000,
warmup_ratio=warmup_ratio,
group_by_length=group_by_length,
lr_scheduler_type=lr_scheduler_type
)
trainer = SFTTrainer(
model=peft_model,
train_dataset=data,
peft_config=peft_config,
dataset_text_field=" column name",
max_seq_length=3000,
tokenizer=tokenizer,
args=training_arguments,
packing=packing,
)
trainer.train()
during this training I am getting the multiple checkpoints in the specified output directory output_dir.
Once the model training is over I can save the model using
trainer.save_model()
Not only that i can save the final model using
trainer.model.save_pretrained("path")
So I bit confused. Which is the actual way to store the adapter after PEFT based lora fine-tuning
whether it is
1 - Take the least loss checkpoint folder from the output_dir
or
2 - save the adapter using
trainer.save_model()
or
3 - this method
trainer.model.save_pretrained("path")