I am trying to train google/long-t5-local-base to generate some demo data for me. I wrote a function that tokenized training data and added the tokens to a tokenizer. I tried to use it in a training loop, and it complained that no config.json file existed. I then tried bringing that over from the HuggingFace repo and nothing changed.
How can I get the tokenizer to load properly? Originally, I had the following files
- added_tokens.json
- special_tokens_map.json
- tokenizer.json
- tokenizer_config.json