I have a preprocessed dataset. The tokens are split by whitespace. So I need a very simple tokenizer to load this. Is there any advice about how to create this?
Related topics
Topic | Replies | Views | Activity | |
---|---|---|---|---|
Use a pretrained ByteLevelBPETokenizer on text | 1 | 3835 | July 17, 2020 | |
Save tokenizer with argument | 2 | 1969 | October 26, 2022 | |
Train Retry Tokenizer | 0 | 224 | April 18, 2023 | |
Writing custom tokenizer and wrapping it in tokenizer object | 2 | 802 | June 26, 2023 | |
Can't load tokenizer for 'sshleifer/student_blarge_12_3' | 0 | 331 | May 6, 2021 |