update fixed tokenizer.json for v5
#1
by itazap HF Staff - opened
As a llama tokenizer, trie model was trained with a bug. Following a deprecation cycle and changes in transformers v5, we no longer have legacy flags to fix this so we need to update the file!
The only change is:
self._tokenizer.pre_tokenizer = None
self._tokenizer.normalizer = normalizers.Sequence(
[normalizers.Prepend(prepend="β"), normalizers.Replace(pattern=" ", content="β")]
)
in LlamaTokenizer, then save_pretrained
itazap changed pull request status to closed
itazap changed pull request status to open