update fixed tokenizer.json for v5

#1
by itazap HF Staff - opened

As a llama tokenizer, trie model was trained with a bug. Following a deprecation cycle and changes in transformers v5, we no longer have legacy flags to fix this so we need to update the file!

The only change is:

        self._tokenizer.pre_tokenizer = None
        self._tokenizer.normalizer = normalizers.Sequence(
            [normalizers.Prepend(prepend="▁"), normalizers.Replace(pattern=" ", content="▁")]
        )

in LlamaTokenizer, then save_pretrained

itazap changed pull request status to closed
itazap changed pull request status to open
Ready to merge
This branch is ready to get merged automatically.

Sign up or log in to comment