Everytime I instantiate the tokenizer using
self.tokenizer = AutoTokenizer.from_pretrained("'sentence-transformers/all-MiniLM-L12-v2'")
it calls to huggingface to download the tokenizer:
[connectionpool.py:474] - https://huggingface.co:443 "HEAD /sentence-transformers/all-MiniLM-L12-v2/resolve/main
It seems it is not being cached locally and you cannot specify a cache_dir
for the tokenizer. You can with AutoModel
, but not with AutoTokenizer
How do I prevent this?