tokenizer_config.json missing chat_template field (tool calling broken without workaround)
The tokenizer_config.json in this repo does not include the chat_template field. The official NVIDIA variants (nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-NVFP4, nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-FP8) include a 10,504-character chat_template in their tokenizer_config.json that contains the Jinja2 template with tool-calling XML formatting, reasoning/thinking control (enable_thinking), and multi-turn conversation handling.
This repo does ship chat_template.jinja as a standalone file (same content, 10,504 bytes), which Transformers 4.43+ auto-discovers. So newer Transformers versions work around the gap. However, older Transformers versions or tools that read tokenizer_config.json directly will not find a chat template, which can cause:
- Silent fallback to a generic template without tool-calling support
- tokenizer.apply_chat_template() failures
- Broken tool calling when served via vLLM or other inference engines on older stacks
Comparison:
- This repo's tokenizer_config.json: 177,180 bytes, no chat_template key
- NVIDIA NVFP4's tokenizer_config.json: 188,049 bytes, includes chat_template (10,504 chars)
- All other fields are identical between the two files
Suggested fix: Copy the chat_template field from nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-NVFP4's tokenizer_config.json into this repo's tokenizer_config.json. The content is identical to what's already in your chat_template.jinja file.
Fallback recommendation is to enforce a specific vLLM / transformers version requirement.
Thanks for reporting @seanthomaswilliams ! I updated the tokenizer_config.json with the version of the NVFP4 quant. 🫡