Greetings, I am trying to run the model according to the manual: Installation, but I ran into a problem. Below I describe my steps.
Operating system: Ubuntu 20.04.6 installed on WSL
Updated the packages:
sudo apt update
sudo apt upgrade -y
Installed Python:
sudo apt-get update
sudo apt-get install git curl python3-pip make gcc libssl-dev zlib1g-dev libbz2-dev libreadline-dev libsqlite3-dev wget llvm libncurses5-dev libncursesw5-dev xz-utils tk-dev libffi-dev liblzma-dev python-openssl
curl https://pyenv.run | bash
export PATH="$HOME/.pyenv/bin:$PATH" && eval "$(pyenv init --path)" && echo -e 'if command -v pyenv 1>/dev/null 2>&1; then\n eval "$(pyenv init -)"\nfi' >> ~/.bashrc
pyenv install 3.11.3
pyenv global 3.11.3
Created and activated the virtual environment:
python -m venv .env
source .env/bin/activate
Installed TensorFlow:
pip install --upgrade tensorflow
Installed PyTorch:
pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
Installed Flax:
pip install --upgrade git+https://github.com/google/flax.git
Installed transformers:
pip install transformers
Next I try to download and run the model:
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
tokenizer = AutoTokenizer.from_pretrained("bigscience/T0_3B")
model = AutoModelForSeq2SeqLM.from_pretrained("bigscience/T0_3B")
The line
tokenizer = AutoTokenizer.from_pretrained("bigscience/T0_3B")
throws an error
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/dave/.env/lib/python3.11/site-packages/transformers/models/auto/tokenization_auto.py", line 711, in from_pretrained
    return tokenizer_class_fast.from_pretrained(pretrained_model_name_or_path, *inputs, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/dave/.env/lib/python3.11/site-packages/transformers/tokenization_utils_base.py", line 1812, in from_pretrained
    return cls._from_pretrained(
           ^^^^^^^^^^^^^^^^^^^^^
  File "/home/dave/.env/lib/python3.11/site-packages/transformers/tokenization_utils_base.py", line 1975, in _from_pretrained
    tokenizer = cls(*init_inputs, **init_kwargs)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/dave/.env/lib/python3.11/site-packages/transformers/models/t5/tokenization_t5_fast.py", line 133, in __init__
    super().__init__(
  File "/home/dave/.env/lib/python3.11/site-packages/transformers/tokenization_utils_fast.py", line 120, in __init__
    raise ValueError(
ValueError: Couldn't instantiate the backend tokenizer from one of:
(1) a `tokenizers` library serialization file,
(2) a slow tokenizer instance to convert or
(3) an equivalent slow tokenizer class to instantiate and convert.
You need to have sentencepiece installed to convert a slow tokenizer to a fast one.
What am I doing wrong?