Deployment Failure under vLLM
#11
by KeeKalm - opened
Following the vLLM documentation, I added the startup parameters --tokenizer Qwen/Qwen3.5-35B-A3B --hf-config-path Qwen/Qwen3.5-35B-A3B when launching the gguf model:
vllm serve ./Qwen3.5-35B-A3B-Uncensored-HauhauCS-Aggressive-IQ2_M.gguf --served-model-name my-qwen-model --port 8000 --tokenizer Qwen/Qwen3.5-35B-A3B --hf-config-path Qwen/Qwen3.5-35B-A3B
I attempted to specify the --hf-config-path configuration file locally. Since your repository does not contain a config.json, I retrieved the file from the Qwen/Qwen3.5-35B-A3B repository and replaced it with --hf-config-path ./config.json. However, deployment still fails.
env:
- os: windows wsl ubuntu22
- RTX 3060
- vllm 0.17.1
Below is the error code:
(APIServer pid=823) INFO 03-19 23:22:16 [utils.py:302]
(APIServer pid=823) INFO 03-19 23:22:16 [utils.py:302] β β ββ ββ
(APIServer pid=823) INFO 03-19 23:22:16 [utils.py:302] ββ ββ β β β βββ β version 0.17.1
(APIServer pid=823) INFO 03-19 23:22:16 [utils.py:302] ββββ β β β β model ./Qwen3.5-35B-A3B-Uncensored-HauhauCS-Aggressive-IQ2_M.gguf
(APIServer pid=823) INFO 03-19 23:22:16 [utils.py:302] ββ βββββ βββββ β β
(APIServer pid=823) INFO 03-19 23:22:16 [utils.py:302]
(APIServer pid=823) INFO 03-19 23:22:16 [utils.py:238] non-default args: {'model_tag': './Qwen3.5-35B-A3B-Uncensored-HauhauCS-Aggressive-IQ2_M.gguf', 'model': './Qwen3.5-35B-A3B-Uncensored-HauhauCS-Aggressive-IQ2_M.gguf', 'tokenizer': 'Qwen/Qwen3.5-35B-A3B', 'hf_config_path': 'Qwen/Qwen3.5-35B-A3B', 'served_model_name': ['my-qwen-model']}
(APIServer pid=823) WARNING 03-19 23:22:16 [system_utils.py:287] Found ulimit of 1024 and failed to automatically increase with error current limit exceeds maximum limit. This can cause fd limit errors like `OSError: [Errno 24] Too many open files`. Consider increasing with ulimit -n
(APIServer pid=823) Traceback (most recent call last):
(APIServer pid=823) File "/home/ubuntu_wsl/vllm/.venv/bin/vllm", line 8, in <module>
(APIServer pid=823) sys.exit(main())
(APIServer pid=823) File "/home/ubuntu_wsl/vllm/.venv/lib/python3.10/site-packages/vllm/entrypoints/cli/main.py", line 73, in main
(APIServer pid=823) args.dispatch_function(args)
(APIServer pid=823) File "/home/ubuntu_wsl/vllm/.venv/lib/python3.10/site-packages/vllm/entrypoints/cli/serve.py", line 112, in cmd
(APIServer pid=823) uvloop.run(run_server(args))
(APIServer pid=823) File "/home/ubuntu_wsl/vllm/.venv/lib/python3.10/site-packages/uvloop/__init__.py", line 69, in run
(APIServer pid=823) return loop.run_until_complete(wrapper())
(APIServer pid=823) File "uvloop/loop.pyx", line 1518, in uvloop.loop.Loop.run_until_complete
(APIServer pid=823) File "/home/ubuntu_wsl/vllm/.venv/lib/python3.10/site-packages/uvloop/__init__.py", line 48, in wrapper
(APIServer pid=823) return await main
(APIServer pid=823) File "/home/ubuntu_wsl/vllm/.venv/lib/python3.10/site-packages/vllm/entrypoints/openai/api_server.py", line 471, in run_server
(APIServer pid=823) await run_server_worker(listen_address, sock, args, **uvicorn_kwargs)
(APIServer pid=823) File "/home/ubuntu_wsl/vllm/.venv/lib/python3.10/site-packages/vllm/entrypoints/openai/api_server.py", line 490, in run_server_worker
(APIServer pid=823) async with build_async_engine_client(
(APIServer pid=823) File "/usr/lib/python3.10/contextlib.py", line 199, in __aenter__
(APIServer pid=823) return await anext(self.gen)
(APIServer pid=823) File "/home/ubuntu_wsl/vllm/.venv/lib/python3.10/site-packages/vllm/entrypoints/openai/api_server.py", line 96, in build_async_engine_client
(APIServer pid=823) async with build_async_engine_client_from_engine_args(
(APIServer pid=823) File "/usr/lib/python3.10/contextlib.py", line 199, in __aenter__
(APIServer pid=823) return await anext(self.gen)
(APIServer pid=823) File "/home/ubuntu_wsl/vllm/.venv/lib/python3.10/site-packages/vllm/entrypoints/openai/api_server.py", line 122, in build_async_engine_client_from_engine_args
(APIServer pid=823) vllm_config = engine_args.create_engine_config(usage_context=usage_context)
(APIServer pid=823) File "/home/ubuntu_wsl/vllm/.venv/lib/python3.10/site-packages/vllm/engine/arg_utils.py", line 1468, in create_engine_config
(APIServer pid=823) maybe_override_with_speculators(
(APIServer pid=823) File "/home/ubuntu_wsl/vllm/.venv/lib/python3.10/site-packages/vllm/transformers_utils/config.py", line 520, in maybe_override_with_speculators
(APIServer pid=823) config_dict, _ = PretrainedConfig.get_config_dict(
(APIServer pid=823) File "/home/ubuntu_wsl/vllm/.venv/lib/python3.10/site-packages/transformers/configuration_utils.py", line 662, in get_config_dict
(APIServer pid=823) config_dict, kwargs = cls._get_config_dict(pretrained_model_name_or_path, **kwargs)
(APIServer pid=823) File "/home/ubuntu_wsl/vllm/.venv/lib/python3.10/site-packages/transformers/configuration_utils.py", line 753, in _get_config_dict
(APIServer pid=823) config_dict = load_gguf_checkpoint(resolved_config_file, return_tensors=False)["config"]
(APIServer pid=823) File "/home/ubuntu_wsl/vllm/.venv/lib/python3.10/site-packages/transformers/modeling_gguf_pytorch_utils.py", line 431, in load_gguf_checkpoint
(APIServer pid=823) raise ValueError(f"GGUF model with architecture {architecture} is not supported yet.")
(APIServer pid=823) ValueError: GGUF model with architecture qwen35moe is not supported yet.