tokenizer_config.json missing chat_template field (tool calling broken without workaround)

Feb 23

The tokenizer_config.json in this repo does not include the chat_template field. The official NVIDIA variants (nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-NVFP4, nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-FP8) include a 10,504-character chat_template in their tokenizer_config.json that contains the Jinja2 template with tool-calling XML formatting, reasoning/thinking control (enable_thinking), and multi-turn conversation handling.

This repo does ship chat_template.jinja as a standalone file (same content, 10,504 bytes), which Transformers 4.43+ auto-discovers. So newer Transformers versions work around the gap. However, older Transformers versions or tools that read tokenizer_config.json directly will not find a chat template, which can cause:

Silent fallback to a generic template without tool-calling support
tokenizer.apply_chat_template() failures
Broken tool calling when served via vLLM or other inference engines on older stacks

Comparison:

This repo's tokenizer_config.json: 177,180 bytes, no chat_template key
NVIDIA NVFP4's tokenizer_config.json: 188,049 bytes, includes chat_template (10,504 chars)
All other fields are identical between the two files

Suggested fix: Copy the chat_template field from nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-NVFP4's tokenizer_config.json into this repo's tokenizer_config.json. The content is identical to what's already in your chat_template.jinja file.

Fallback recommendation is to enforce a specific vLLM / transformers version requirement.

stelterlab

Owner Feb 23

Thanks for reporting @seanthomaswilliams ! I updated the tokenizer_config.json with the version of the NVFP4 quant. 🫡

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment