Instructions to use IQuestLab/IQuest-Coder-V1-40B-Instruct with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use IQuestLab/IQuest-Coder-V1-40B-Instruct with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="IQuestLab/IQuest-Coder-V1-40B-Instruct", trust_remote_code=True) messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoModelForCausalLM model = AutoModelForCausalLM.from_pretrained("IQuestLab/IQuest-Coder-V1-40B-Instruct", trust_remote_code=True, dtype="auto") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use IQuestLab/IQuest-Coder-V1-40B-Instruct with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "IQuestLab/IQuest-Coder-V1-40B-Instruct" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "IQuestLab/IQuest-Coder-V1-40B-Instruct", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/IQuestLab/IQuest-Coder-V1-40B-Instruct
- SGLang
How to use IQuestLab/IQuest-Coder-V1-40B-Instruct with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "IQuestLab/IQuest-Coder-V1-40B-Instruct" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "IQuestLab/IQuest-Coder-V1-40B-Instruct", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "IQuestLab/IQuest-Coder-V1-40B-Instruct" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "IQuestLab/IQuest-Coder-V1-40B-Instruct", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use IQuestLab/IQuest-Coder-V1-40B-Instruct with Docker Model Runner:
docker model run hf.co/IQuestLab/IQuest-Coder-V1-40B-Instruct
IQuest-Coder-V1 inference support and benchmarks
We’re DeployPad, a team focused on high-performance inference for open-weight models.
We’ve added official support for the IQuest-Coder-V1 family, with the goal of supporting the IQuestLab community by making the model easier and more cost-efficient to run without changing how you interact with it.
Current performance
~50–80 tokens/sec
Batch size: 32
We are also planning to add lower cost GPU options, including RTX Pro 6000 and L40s, both running at FP8 precision, to further reduce deployment cost while maintaining performance.
Benchmarks and methodology are publicly available here
https://github.com/geoddllc/large-llm-inference-benchmarks
If you want to try it yourself, you can deploy via the console
https://console.geodd.io/
(top up and run no platform fees beyond compute)
Feedback, questions, or requests from the community are welcome feel free to leave a comment below.