AutoTokenizer keeps redownloading

zilong · March 12, 2024, 3:03pm

Everytime I instantiate the tokenizer using

self.tokenizer = AutoTokenizer.from_pretrained("'sentence-transformers/all-MiniLM-L12-v2'")

it calls to huggingface to download the tokenizer:

[connectionpool.py:474] - https://huggingface.co:443 "HEAD /sentence-transformers/all-MiniLM-L12-v2/resolve/main

It seems it is not being cached locally and you cannot specify a cache_dir for the tokenizer. You can with AutoModel, but not with AutoTokenizer How do I prevent this?

Sandy1857 · March 13, 2024, 2:48pm

You could use tokenizer.save_pretrained() for the same

Topic		Replies	Views
AutoTokenizer.from_pretrained() suddenly raises an error 🤗Transformers	4	74	May 7, 2025
Caching tokenization 🤗Tokenizers	0	240	January 14, 2024
How to cache tokenization for the data Beginners	2	815	January 16, 2024
Tokenizer.from_pretrained calls stuck forever 🤗Transformers	0	638	April 30, 2023
Loading SentencePiece tokenizer Beginners	3	4990	October 24, 2023

AutoTokenizer keeps redownloading

Related topics