Hello, I created a customer tokenizer called ProteinTokenizer that inherits from PreTrainedTokenizer. I saved it using save_pretrained, but when I try to load it in the following way, i get an error. I’ve been trying to debug for hours, but not sure what’s going on. Not sure If I’m naming something incorrectly, skipping an important step or what. Would love some help!
Here are the details:
I run:
tokenizer = AutoTokenizer.from_pretrained(
‘samirchar/test_dayhoff’,
subfolder = “jamba-170m-seqsam-36w”,
trust_remote_code=True,
from_slow=True
)
The error:
My repo structure looks like this:
Inside the tokenizer_config.json i’ve added this:
The init.py is empty
Thank you!