update fixed tokenizer.json for v5

by itazap HF Staff - opened 12 days ago

base: refs/heads/main

←

from: refs/pr/1

Discussion Files changed

+244998

-61250

update fixed tokenizer.json for v58d1306b1

itazap

12 days ago

•

edited 12 days ago

As a llama tokenizer, trie model was trained with a bug. Following a deprecation cycle and changes in transformers v5, we no longer have legacy flags to fix this so we need to update the file!

The only change is:

        self._tokenizer.pre_tokenizer = None
        self._tokenizer.normalizer = normalizers.Sequence(
            [normalizers.Prepend(prepend="▁"), normalizers.Replace(pattern=" ", content="▁")]
        )

in LlamaTokenizer, then save_pretrained

itazap changed pull request status to closed 12 days ago

itazap changed pull request status to open 12 days ago

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

Ready to merge

This branch is ready to get merged automatically.

· Sign up or log in to comment