Some HF operations take an excessively long time to complete

j-m · June 11, 2022, 7:48am

Hi,

I am trying to adapt the notebook Getting started with NLP for absolute beginners authored by Jeremy Howard to another use case (lithology classification, for background only).

I am working on my laptop, have set up a dedicated huggingface jupyter conda kernel (end of this post, for reference).

My network connection (workplace) is rather fast, and working fine, but some operations with Hugging Face are taking a very long time to complete, and I am puzzled as to why. I cannot recall the original notebook taking such a large time to execute the same operations. I am new to Hugging Face, so any guidance or feedback would be appreciated.

A few things I observed using ipython’s %%time directives:

model_nm = 'microsoft/deberta-v3-small'
tokz = AutoTokenizer.from_pretrained(model_nm)

CPU times: user 504 ms, sys: 57.9 ms, total: 562 ms
Wall time: 14min 13s

model = AutoModelForSequenceClassification.from_pretrained(model_nm, num_labels=num_labels)

Downloading: 100%
273M/273M [00:25<00:00, 11.6MB/s]
CPU times: user 4.38 s, sys: 1.17 s, total: 5.55 s
Wall time: 19min 16s

from datasets import load_metric
metric = load_metric("accuracy")

CPU times: user 92.7 ms, sys: 70.7 ms, total: 163 ms
Wall time: 6min 41s

Background: my conda environment was created with:

conda create -n hf python=3.9 -c conda-forge
myenv=hf
mamba install -n $myenv -c conda-forge ipykernel matplotlib
mamba install -n $myenv -c huggingface -c conda-forge datasets transformers
mamba install -n $myenv pytorch=1.11 -c pytorch -c nvidia -c conda-forge
mamba install -n $myenv torchvision torchaudio -c pytorch -c nvidia -c conda-forge
mamba install -n $myenv -c conda-forge sentencepiece

BramVanroy · June 11, 2022, 6:20pm

This is very strange behavior. There is a huge discrepancy between wall time and CPU time. Do you have the same issue if you do not run this in a notebook but in a regular Python script?

To me, I would guess at first glance that this is an I/O. Where are you storing the models/files? Is it possible that you have set up your cache on an external drive or something?

j-m · June 12, 2022, 3:40am

I’ve tried to replicate the issue on another PC; same LAN, but a Windows10 box.

The completion time of AutoTokenizer.from_pretrained('microsoft/deberta-v3-small') is much more acceptable (10s). On Linux, using a very similar conda environment, set up on a local drive, still stubbornly 14 minutes.

I’ve taken some more notes for possible repro:

set myenv=hf
mamba create -n  %myenv%  python=3.9 -c conda-forge
mamba install -n  %myenv%  ipykernel matplotlib -c conda-forge
mamba install -n  %myenv%  -c pytorch -c conda-forge --file nbdev.txt
mamba install -n  %myenv%  pytorch=1.11 -c pytorch -c nvidia -c conda-forge
mamba install -n  %myenv%  torchvision torchaudio -c pytorch -c nvidia -c conda-forge
mamba install -n  %myenv%  -c conda-forge sentencepiece
mamba install -n  %myenv%  -c huggingface -c conda-forge datasets transformers scikit-learn

mamba activate  %myenv% 
python -m ipykernel install --user --name  %myenv%  --display-name "Hugging Face"

# %%
from transformers import AutoTokenizer # Takes 30 seconds, but only the first time this is done. After, pretty quick
# %%
model_nm = 'microsoft/deberta-v3-small'
# %%
tokz = AutoTokenizer.from_pretrained(model_nm)  # 9 seconds on Windows, 14 minutes on Linux

The Windows OS is:

Edition	Windows 10 Enterprise
Version	21H2
OS build	19044.1645
Experience	Windows Feature Experience Pack 120.2212.4170.0

The Linux OS is:

Linux blahblahblah 5.10.0-14-amd64 #1 SMP Debian 5.10.113-1 (2022-04-29) x86_64 GNU/Linux

I am scratching my head. The only difference I can think of is that I also installed on Linux in the conda env some fast.ai libraries, notably fastcore, as well as specifying:

nbformat>=4.4.0
nbconvert>=6.1
pyyaml
jupyter 
jupyter_client<8 
ghapi 
Jinja2<3.1.0

I think I can only try to recreate from the very scratch the exact same conda env on Linux, and see if the issue goes away.

I hope this helps someone out there if one bumps into the same issue. I may open a github issue if I feel I’ve ruled out problems within my control.

j-m · June 12, 2022, 5:34am

This was a rabbit hole, but I diagnosed and sorted this out.

I won’t get these hours of my life back, but I sincerely hope this will save some for a future reader.

TL;DR

Are you on a network configured with IPv6 and IPv4?
Have you turned off your WiFi connection and kept only your ethernet one? Resolved the issue for me.
I am not sure if this is a set of circumstances likely enough to warrant some design changes in HF. I may have hit a very unlikely scenario.

Detailed diagnosis

The call stack was:

create_connection (/path/to/condaenv/urllib3/util/connection.py:86)
_new_conn (/path/to/condaenv/urllib3/connection.py:174)
connect (/path/to/condaenv/urllib3/connection.py:358)
_validate_conn (/path/to/condaenv/urllib3/connectionpool.py:1040)
_make_request (/path/to/condaenv/urllib3/connectionpool.py:386)
urlopen (/path/to/condaenv/urllib3/connectionpool.py:703)
send (/path/to/condaenv/requests/adapters.py:489)
send (/path/to/condaenv/requests/sessions.py:701)
request (/path/to/condaenv/requests/sessions.py:587)
request (/path/to/condaenv/requests/api.py:59)
head (/path/to/condaenv/requests/api.py:100)
get_from_cache (/path/to/condaenv/transformers/file_utils.py:1573)
cached_path (/path/to/condaenv/transformers/file_utils.py:1402)
get_config_dict (/path/to/condaenv/transformers/configuration_utils.py:546)
from_pretrained (/path/to/condaenv/transformers/models/auto/configuration_auto.py:527)
from_pretrained (/path/to/condaenv/transformers/models/auto/tokenization_auto.py:463)

in the urllib3 package, create_connection has a loop trying to create socket connections:

    for res in socket.getaddrinfo(host, port, family, socket.SOCK_STREAM):
        af, socktype, proto, canonname, sa = res
        sock = None
        try:
            # snip, some things including setting a timeout=10 seconds
            sock.connect(sa)

socket.getaddrinfo(host, port, family, socket.SOCK_STREAM) consists of (censored with fake addresses):

(<AddressFamily.AF_INET6: 10>, <SocketKind.SOCK_STREAM: 1>, 6, '', ('1234:5555:6543:ff00:...:b1f2:e3a4', 443, 0, 0))
(<AddressFamily.AF_INET6: 10>, <SocketKind.SOCK_STREAM: 1>, 6, '', ('1234:5555:6543:ff50:...:8142:2344', 443, 0, 0))
(<AddressFamily.AF_INET6: 10>, <SocketKind.SOCK_STREAM: 1>, 6, '', ('1234:5555:6543:ff00:...:2102:d3a4', 443, 0, 0))
(<AddressFamily.AF_INET6: 10>, <SocketKind.SOCK_STREAM: 1>, 6, '', ('1234:5555:6543:ff50:...:d122:d3b4', 443, 0, 0))
(<AddressFamily.AF_INET: 2>, <SocketKind.SOCK_STREAM: 1>, 6, '', ('1.5.66.77', 443))
(<AddressFamily.AF_INET: 2>, <SocketKind.SOCK_STREAM: 1>, 6, '', ('2.5.65.77', 443))
(<AddressFamily.AF_INET: 2>, <SocketKind.SOCK_STREAM: 1>, 6, '', ('3.5.64.77', 443))
(<AddressFamily.AF_INET: 2>, <SocketKind.SOCK_STREAM: 1>, 6, '', ('4.5.63.77', 443))

it looks like the one that succeeds is the 6th, (<AddressFamily.AF_INET: 2>, <SocketKind.SOCK_STREAM: 1>, 6, '', ('2.5.65.77', 443)).

transformers/file_utils has at least one connexion initiation: get_from_cache (/path/to/condaenv/transformers/file_utils.py:1573)

    if not local_files_only:
        try:
            r = requests.head(url, headers=headers, allow_redirects=False, proxies=proxies, timeout=etag_timeout)

etag_timeout is 10, visibly 10 seconds, which may be a sensible default.

I have not looked at how many connexions attempts are made, but presumatly this is way more than the 6 * 10 seconds worth triggered by one get_from_cache call.

I have not debugged running all this as root, but presumably the first attempt at a socket connection just succeeds.

If I turn off the WiFi adapter…

…the list of socket address to try are in a different order, importantly IPv4 addresses are listed first:

(<AddressFamily.AF_INET: 2>, <SocketKind.SOCK_STREAM: 1>, 6, '', ('3.5.64.77', 443))
(<AddressFamily.AF_INET: 2>, <SocketKind.SOCK_STREAM: 1>, 6, '', ('2.5.65.77', 443))
(<AddressFamily.AF_INET: 2>, <SocketKind.SOCK_STREAM: 1>, 6, '', ('1.5.66.77', 443))
(<AddressFamily.AF_INET: 2>, <SocketKind.SOCK_STREAM: 1>, 6, '', ('4.5.63.77', 443))

(<AddressFamily.AF_INET6: 10>, <SocketKind.SOCK_STREAM: 1>, 6, '', ('1234:5555:6543:ff50:...:8142:2344', 443, 0, 0))
(<AddressFamily.AF_INET6: 10>, <SocketKind.SOCK_STREAM: 1>, 6, '', ('1234:5555:6543:ff00:...:b1f2:e3a4', 443, 0, 0))
(<AddressFamily.AF_INET6: 10>, <SocketKind.SOCK_STREAM: 1>, 6, '', ('1234:5555:6543:ff00:...:2102:d3a4', 443, 0, 0))
(<AddressFamily.AF_INET6: 10>, <SocketKind.SOCK_STREAM: 1>, 6, '', ('1234:5555:6543:ff50:...:d122:d3b4', 443, 0, 0))

BramVanroy · June 12, 2022, 3:27pm

Interesting find! Thanks for digging through it, and thanks for reporting your findings here!

sgugger · June 13, 2022, 1:53pm

Note that there was some outage on the Model Hub this weekend due to DDOS attacks. So it’s possible it affected the time for responses on the requests.

tncardoso · November 11, 2023, 2:05pm

Hi, I had the same problem. Another workaround is to disable ipv6, seems like a problem in urllib3

$ sysctl net.ipv6.conf.all.disable_ipv6=1

rbp1 · February 2, 2025, 11:56pm

wow. hard to believe its something so low level. i used System Settings to turn off ip6

Topic		Replies	Views
Load_datasets is extremely slow in loading HF datasets Beginners	1	2455	December 15, 2023
Hugging Face Llama-2 (7b) taking too much time while inferencing Models	1	1495	June 23, 2024
Inference for a 7B model on A100 takes too long? Beginners	1	1660	March 15, 2024
Models slow on M1 Pro 16gb Beginners	0	729	December 18, 2023
Not using GPU although it is specified Course	5	31065	December 30, 2024

Some HF operations take an excessively long time to complete

TL;DR

Detailed diagnosis

If I turn off the WiFi adapter…

Related topics