I came across this short snippet of code on LinkedIn by HuggingFace, introducing tokenizers 0.9.
LinkedIn URL: snippet for tokenizers
0.9
How do I get the following dataset to run the code snippet? Is it available on huggingface.datasets
?
files = ["../../data/wiki-big.train.raw"]