Hi guys
sorry but i would be so thankful, if someone could have a look on my problem
i already read the other discussion and i didnt find my problem…
im a beginner in huggingface, so please be nice. But I already installed different modells, and most of it works fine. But not this one “google/mt5-base”
im loading with this:
from transformers import AutoTokenizer, AutoModel
tokenizer = AutoTokenizer.from_pretrained(“google/mt5-base”)
model = AutoModel.from_pretrained(“google/mt5-base”)
and geht this error:
OSError: Unable to load vocabulary from file. Please check that the provided vocabulary is accessible and not corrupted.
OSError Traceback (most recent call last)
c:\Anaconda3\lib\site-packages\transformers\tokenization_utils_base.py in _from_pretrained(cls, resolved_vocab_files, pretrained_model_name_or_path, init_configuration, use_auth_token, cache_dir, local_files_only, _commit_hash, *init_inputs, **kwargs)
1957 try:
→ 1958 tokenizer = cls(*init_inputs, **init_kwargs)
1959 except OSError:
c:\Anaconda3\lib\site-packages\transformers\models\t5\tokenization_t5.py in init(self, vocab_file, eos_token, unk_token, pad_token, extra_ids, additional_special_tokens, sp_model_kwargs, **kwargs)
153 self.sp_model = spm.SentencePieceProcessor(**self.sp_model_kwargs)
→ 154 self.sp_model.Load(vocab_file)
155
c:\Anaconda3\lib\site-packages\sentencepiece_init_.py in Load(self, model_file, model_proto)
366 return self.LoadFromSerializedProto(model_proto)
→ 367 return self.LoadFromFile(model_file)
368
c:\Anaconda3\lib\site-packages\sentencepiece_init_.py in LoadFromFile(self, arg)
170 def LoadFromFile(self, arg):
→ 171 return _sentencepiece.SentencePieceProcessor_LoadFromFile(self, arg)
172
OSError: Not found: “C:\Users\clööd/.cache\huggingface\hub\models–google–mt5-small\snapshots\38f23af8ec210eb6c376d40e9c56bd25a80f195d\spiece.model”: No such file or directory Error #2
During handling of the above exception, another exception occurred:
OSError Traceback (most recent call last)
~\AppData\Local\Temp\ipykernel_3400\1414250741.py in
----> 1 tokenizer = AutoTokenizer.from_pretrained(“google/mt5-small”)
2 model = AutoModel.from_pretrained(“google/mt5-small”)
c:\Anaconda3\lib\site-packages\transformers\models\auto\tokenization_auto.py in from_pretrained(cls, pretrained_model_name_or_path, *inputs, **kwargs)
677 f"Tokenizer class {tokenizer_class_candidate} does not exist or is not currently imported."
678 )
→ 679 return tokenizer_class.from_pretrained(pretrained_model_name_or_path, *inputs, **kwargs)
680
681 # Otherwise we have to be creative.
c:\Anaconda3\lib\site-packages\transformers\tokenization_utils_base.py in from_pretrained(cls, pretrained_model_name_or_path, *init_inputs, **kwargs)
1802 logger.info(f"loading file {file_path} from cache at {resolved_vocab_files[file_id]}")
1803
→ 1804 return cls._from_pretrained(
1805 resolved_vocab_files,
1806 pretrained_model_name_or_path,
c:\Anaconda3\lib\site-packages\transformers\tokenization_utils_base.py in _from_pretrained(cls, resolved_vocab_files, pretrained_model_name_or_path, init_configuration, use_auth_token, cache_dir, local_files_only, _commit_hash, *init_inputs, **kwargs)
1832 has_tokenizer_file = resolved_vocab_files.get(“tokenizer_file”, None) is not None
1833 if (from_slow or not has_tokenizer_file) and cls.slow_tokenizer_class is not None:
→ 1834 slow_tokenizer = (cls.slow_tokenizer_class)._from_pretrained(
1835 copy.deepcopy(resolved_vocab_files),
1836 pretrained_model_name_or_path,
c:\Anaconda3\lib\site-packages\transformers\tokenization_utils_base.py in _from_pretrained(cls, resolved_vocab_files, pretrained_model_name_or_path, init_configuration, use_auth_token, cache_dir, local_files_only, _commit_hash, *init_inputs, **kwargs)
1958 tokenizer = cls(*init_inputs, **init_kwargs)
1959 except OSError:
→ 1960 raise OSError(
1961 "Unable to load vocabulary from file. "
1962 “Please check that the provided vocabulary is accessible and not corrupted.”
OSError: Unable to load vocabulary from file. Please check that the provided vocabulary is accessible and not corrupted.
i tried so much:
- delete cach folder of huggingface,
- download file from model manually from git
- downgrade python to 3.8.5
- uninstall & reinstall transformers, huggingface-hub, tensorflow, torch
i dont understand why the installation for other models works, but not for this mt5?
thanks in advice for any(!) help…
infos:
-
transformers
version: 4.27.3 - Platform: Windows-10-10.0.19044-SP0
- Python version: 3.9.13
- Huggingface_hub version: 0.13.3
- PyTorch version (GPU?): 2.0.0+cpu (False)
- Tensorflow version (GPU?): 2.12.0 (False)
- Flax version (CPU?/GPU?/TPU?): not installed (NA)
- Jax version: not installed
- JaxLib version: not installed
- Using GPU in script?: No?
- Using distributed or parallel set-up in script?: No?
- sentencepiece: 0.1.97
Additional:
somehow i think the problem is around the Tokenizer.
Because when i try to load a model that works, e.g. “google/flan-t5-base”,
it only works with AutoTokenizer, not with T5Tokenizer…