I built a tokenizer and trained an LM from scratch following this link.
Then I used this to train a clip using this tokenizer. The training went fine.
Now, when I load this clip for evaluation, I get this error:
ValueError: The
backend_tokenizerprovided does not match the expected format. The CLIP tokenizer has been heavily modified from transformers version 4.17.0. You need to convert the tokenizer you are using to be compatible with this version.The easiest way to do so is
CLIPTokenizerFast.from_pretrained("path_to_local_folder_or_hub_repo, from_slow=True). If you want to use your existing tokenizer, you will have to revert to a version prior to 4.17.0 of transformers.
When I load the tokenizer I get the same error:
tokenizer = CLIPTokenizerFast.from_pretrained(“/home/user/ckpt10k”)
tokenizer = CLIPTokenizerFast.from_pretrained(“/home/user/ckpt10k”, from_slow=True)
Or
tokenizer = CLIPTokenizerFast.from_pretrained(“/home/user/ckpt10k”, from_slow=False)
The tokenizer class you load from this checkpoint is 'PreTrainedTokenizerFast'.
The class this function is called from is 'CLIPTokenizer'.
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/user/miniforge-pypy3/envs/clip/lib/python3.12/site-packages/transformers/tokenization_utils_base.py", line 2089, in from_pretrained
return cls._from_pretrained(
^^^^^^^^^^^^^^^^^^^^^
File "/home/user/miniforge-pypy3/envs/clip/lib/python3.12/site-packages/transformers/tokenization_utils_base.py", line 2123, in _from_pretrained
slow_tokenizer = (cls.slow_tokenizer_class)._from_pretrained(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/user/miniforge-pypy3/envs/clip/lib/python3.12/site-packages/transformers/tokenization_utils_base.py", line 2311, in _from_pretrained
tokenizer = cls(*init_inputs, **init_kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/user/miniforge-pypy3/envs/clip/lib/python3.12/site-packages/transformers/models/clip/tokenization_clip.py", line 306, in __init__
with open(vocab_file, encoding="utf-8") as vocab_handle:
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: expected str, bytes or os.PathLike object, not NoneType
Can you please suggest the right way to load CLIP?