Instructions to use openchat/openchat_3.5 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use openchat/openchat_3.5 with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="openchat/openchat_3.5")# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("openchat/openchat_3.5", dtype="auto") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use openchat/openchat_3.5 with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "openchat/openchat_3.5" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "openchat/openchat_3.5", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/openchat/openchat_3.5
- SGLang
How to use openchat/openchat_3.5 with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "openchat/openchat_3.5" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "openchat/openchat_3.5", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "openchat/openchat_3.5" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "openchat/openchat_3.5", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use openchat/openchat_3.5 with Docker Model Runner:
docker model run hf.co/openchat/openchat_3.5
Why does it report an error like this when running?
Hello openchat team,
root@autodl-container-a44e4284cd-59f17d3a:/autodl-tmp# python -m ochat.serving.openai_api_server --model openchat/openchat_3.5/autodl-tmp#
FlashAttention not found. Install it if you need to train models.
FlashAttention not found. Install it if you need to train models.
2023-11-10 01:15:51,222 WARNING utils.py:581 -- Detecting docker specified CPUs. In previous versions of Ray, CPU detection in containers was incorrect. Please ensure that Ray has enough CPUs allocated. As a temporary workaround to revert to the prior behavior, set RAY_USE_MULTIPROCESSING_CPU_COUNT=1 as an env var before starting Ray. Set the env var: RAY_DISABLE_DOCKER_CPU_WARNING=1 to mute this warning.
2023-11-10 01:15:52,296 INFO worker.py:1673 -- Started a local Ray instance.
(pid=3924) FlashAttention not found. Install it if you need to train models.
(pid=3924) FlashAttention not found. Install it if you need to train models.
(AsyncTokenizer pid=3924) Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
INFO 11-10 01:15:56 llm_engine.py:72] Initializing an LLM engine with config: model='openchat/openchat_3.5', tokenizer='openchat/openchat_3.5', tokenizer_mode=auto, revision=None, tokenizer_revision=None, trust_remote_code=False, dtype=torch.bfloat16, max_seq_len=8192, download_dir=None, load_format=auto, tensor_parallel_size=1, quantization=None, seed=0)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
INFO 11-10 01:16:10 llm_engine.py:207] # GPU blocks: 3490, # CPU blocks: 2048
INFO: Started server process [3216]
INFO: Waiting for application startup.
INFO: Application startup complete.
ERROR: [Errno 99] error while attempting to bind on address ('::1', 18888, 0, 0): cannot assign requested address
INFO: Waiting for application shutdown.
INFO: Application shutdown complete.
root@autodl-container-a44e4284cd-59f17d3a:
The environment has been installed and nothing is wrong. The graphics card is a 4090 single card. The command to run is
python -m ochat.serving.openai_api_server --model openchat/openchat_3.5
This means that your computer does not have an IPv6 address. Try adding --host 127.0.0.1 as a command line argument.