Tokenizers Wheel Takes Forever to Build

Hey all - I have a Docker image that deploys a model using transformers on Google Cloud Run. Hereā€™s what my Dockerfile looks like:

FROM python:3.10-slim

ENV PYTHONUNBUFFERED True

#set up environment
RUN apt-get update && apt-get install --no-install-recommends --no-install-suggests -y curl
RUN apt-get install unzip
RUN apt-get -y install python3
RUN apt-get -y install python3-pip

RUN curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh -s -- -y
ENV PATH="/root/.cargo/bin:${PATH}"

ENV APP_HOME /app
WORKDIR $APP_HOME
COPY . ./

RUN pip3 install torch --extra-index-url https://download.pytorch.org/whl/cpu
RUN pip3 install --no-cache-dir -r requirements.txt


CMD exec gunicorn --bind :$PORT --workers 1 --threads 8 --timeout 0 app:app

This does build properly, but it takes extremely long to build on Google Cloud Run (like 30+ minutes). It specifically gets stuck on the Building wheel for tokenizers (pyproject.toml).

Do you have any idea why this takes so long or if thereā€™s anything that can be done to speed it up?