Deployment Failure under vLLM

#11
by KeeKalm - opened

Following the vLLM documentation, I added the startup parameters --tokenizer Qwen/Qwen3.5-35B-A3B --hf-config-path Qwen/Qwen3.5-35B-A3B when launching the gguf model:

vllm serve ./Qwen3.5-35B-A3B-Uncensored-HauhauCS-Aggressive-IQ2_M.gguf --served-model-name my-qwen-model --port 8000 --tokenizer Qwen/Qwen3.5-35B-A3B --hf-config-path Qwen/Qwen3.5-35B-A3B

I attempted to specify the --hf-config-path configuration file locally. Since your repository does not contain a config.json, I retrieved the file from the Qwen/Qwen3.5-35B-A3B repository and replaced it with --hf-config-path ./config.json. However, deployment still fails.

env:

  • os: windows wsl ubuntu22
  • RTX 3060
  • vllm 0.17.1

Below is the error code:

(APIServer pid=823) INFO 03-19 23:22:16 [utils.py:302]
(APIServer pid=823) INFO 03-19 23:22:16 [utils.py:302]        β–ˆ     β–ˆ     β–ˆβ–„   β–„β–ˆ
(APIServer pid=823) INFO 03-19 23:22:16 [utils.py:302]  β–„β–„ β–„β–ˆ β–ˆ     β–ˆ     β–ˆ β–€β–„β–€ β–ˆ  version 0.17.1
(APIServer pid=823) INFO 03-19 23:22:16 [utils.py:302]   β–ˆβ–„β–ˆβ–€ β–ˆ     β–ˆ     β–ˆ     β–ˆ  model   ./Qwen3.5-35B-A3B-Uncensored-HauhauCS-Aggressive-IQ2_M.gguf
(APIServer pid=823) INFO 03-19 23:22:16 [utils.py:302]    β–€β–€  β–€β–€β–€β–€β–€ β–€β–€β–€β–€β–€ β–€     β–€
(APIServer pid=823) INFO 03-19 23:22:16 [utils.py:302]
(APIServer pid=823) INFO 03-19 23:22:16 [utils.py:238] non-default args: {'model_tag': './Qwen3.5-35B-A3B-Uncensored-HauhauCS-Aggressive-IQ2_M.gguf', 'model': './Qwen3.5-35B-A3B-Uncensored-HauhauCS-Aggressive-IQ2_M.gguf', 'tokenizer': 'Qwen/Qwen3.5-35B-A3B', 'hf_config_path': 'Qwen/Qwen3.5-35B-A3B', 'served_model_name': ['my-qwen-model']}
(APIServer pid=823) WARNING 03-19 23:22:16 [system_utils.py:287] Found ulimit of 1024 and failed to automatically increase with error current limit exceeds maximum limit. This can cause fd limit errors like `OSError: [Errno 24] Too many open files`. Consider increasing with ulimit -n
(APIServer pid=823) Traceback (most recent call last):
(APIServer pid=823)   File "/home/ubuntu_wsl/vllm/.venv/bin/vllm", line 8, in <module>
(APIServer pid=823)     sys.exit(main())
(APIServer pid=823)   File "/home/ubuntu_wsl/vllm/.venv/lib/python3.10/site-packages/vllm/entrypoints/cli/main.py", line 73, in main
(APIServer pid=823)     args.dispatch_function(args)
(APIServer pid=823)   File "/home/ubuntu_wsl/vllm/.venv/lib/python3.10/site-packages/vllm/entrypoints/cli/serve.py", line 112, in cmd
(APIServer pid=823)     uvloop.run(run_server(args))
(APIServer pid=823)   File "/home/ubuntu_wsl/vllm/.venv/lib/python3.10/site-packages/uvloop/__init__.py", line 69, in run
(APIServer pid=823)     return loop.run_until_complete(wrapper())
(APIServer pid=823)   File "uvloop/loop.pyx", line 1518, in uvloop.loop.Loop.run_until_complete
(APIServer pid=823)   File "/home/ubuntu_wsl/vllm/.venv/lib/python3.10/site-packages/uvloop/__init__.py", line 48, in wrapper
(APIServer pid=823)     return await main
(APIServer pid=823)   File "/home/ubuntu_wsl/vllm/.venv/lib/python3.10/site-packages/vllm/entrypoints/openai/api_server.py", line 471, in run_server
(APIServer pid=823)     await run_server_worker(listen_address, sock, args, **uvicorn_kwargs)
(APIServer pid=823)   File "/home/ubuntu_wsl/vllm/.venv/lib/python3.10/site-packages/vllm/entrypoints/openai/api_server.py", line 490, in run_server_worker
(APIServer pid=823)     async with build_async_engine_client(
(APIServer pid=823)   File "/usr/lib/python3.10/contextlib.py", line 199, in __aenter__
(APIServer pid=823)     return await anext(self.gen)
(APIServer pid=823)   File "/home/ubuntu_wsl/vllm/.venv/lib/python3.10/site-packages/vllm/entrypoints/openai/api_server.py", line 96, in build_async_engine_client
(APIServer pid=823)     async with build_async_engine_client_from_engine_args(
(APIServer pid=823)   File "/usr/lib/python3.10/contextlib.py", line 199, in __aenter__
(APIServer pid=823)     return await anext(self.gen)
(APIServer pid=823)   File "/home/ubuntu_wsl/vllm/.venv/lib/python3.10/site-packages/vllm/entrypoints/openai/api_server.py", line 122, in build_async_engine_client_from_engine_args
(APIServer pid=823)     vllm_config = engine_args.create_engine_config(usage_context=usage_context)
(APIServer pid=823)   File "/home/ubuntu_wsl/vllm/.venv/lib/python3.10/site-packages/vllm/engine/arg_utils.py", line 1468, in create_engine_config
(APIServer pid=823)     maybe_override_with_speculators(
(APIServer pid=823)   File "/home/ubuntu_wsl/vllm/.venv/lib/python3.10/site-packages/vllm/transformers_utils/config.py", line 520, in maybe_override_with_speculators
(APIServer pid=823)     config_dict, _ = PretrainedConfig.get_config_dict(
(APIServer pid=823)   File "/home/ubuntu_wsl/vllm/.venv/lib/python3.10/site-packages/transformers/configuration_utils.py", line 662, in get_config_dict
(APIServer pid=823)     config_dict, kwargs = cls._get_config_dict(pretrained_model_name_or_path, **kwargs)
(APIServer pid=823)   File "/home/ubuntu_wsl/vllm/.venv/lib/python3.10/site-packages/transformers/configuration_utils.py", line 753, in _get_config_dict
(APIServer pid=823)     config_dict = load_gguf_checkpoint(resolved_config_file, return_tensors=False)["config"]
(APIServer pid=823)   File "/home/ubuntu_wsl/vllm/.venv/lib/python3.10/site-packages/transformers/modeling_gguf_pytorch_utils.py", line 431, in load_gguf_checkpoint
(APIServer pid=823)     raise ValueError(f"GGUF model with architecture {architecture} is not supported yet.")
(APIServer pid=823) ValueError: GGUF model with architecture qwen35moe is not supported yet.

Sign up or log in to comment