This was a rabbit hole, but I diagnosed and sorted this out.
I won’t get these hours of my life back, but I sincerely hope this will save some for a future reader.
TL;DR
- Are you on a network configured with IPv6 and IPv4?
- Have you turned off your WiFi connection and kept only your ethernet one? Resolved the issue for me.
- I am not sure if this is a set of circumstances likely enough to warrant some design changes in HF. I may have hit a very unlikely scenario.
Detailed diagnosis
The call stack was:
create_connection (/path/to/condaenv/urllib3/util/connection.py:86)
_new_conn (/path/to/condaenv/urllib3/connection.py:174)
connect (/path/to/condaenv/urllib3/connection.py:358)
_validate_conn (/path/to/condaenv/urllib3/connectionpool.py:1040)
_make_request (/path/to/condaenv/urllib3/connectionpool.py:386)
urlopen (/path/to/condaenv/urllib3/connectionpool.py:703)
send (/path/to/condaenv/requests/adapters.py:489)
send (/path/to/condaenv/requests/sessions.py:701)
request (/path/to/condaenv/requests/sessions.py:587)
request (/path/to/condaenv/requests/api.py:59)
head (/path/to/condaenv/requests/api.py:100)
get_from_cache (/path/to/condaenv/transformers/file_utils.py:1573)
cached_path (/path/to/condaenv/transformers/file_utils.py:1402)
get_config_dict (/path/to/condaenv/transformers/configuration_utils.py:546)
from_pretrained (/path/to/condaenv/transformers/models/auto/configuration_auto.py:527)
from_pretrained (/path/to/condaenv/transformers/models/auto/tokenization_auto.py:463)
in the urllib3 package, create_connection
has a loop trying to create socket connections:
for res in socket.getaddrinfo(host, port, family, socket.SOCK_STREAM):
af, socktype, proto, canonname, sa = res
sock = None
try:
# snip, some things including setting a timeout=10 seconds
sock.connect(sa)
socket.getaddrinfo(host, port, family, socket.SOCK_STREAM)
consists of (censored with fake addresses):
(<AddressFamily.AF_INET6: 10>, <SocketKind.SOCK_STREAM: 1>, 6, '', ('1234:5555:6543:ff00:...:b1f2:e3a4', 443, 0, 0))
(<AddressFamily.AF_INET6: 10>, <SocketKind.SOCK_STREAM: 1>, 6, '', ('1234:5555:6543:ff50:...:8142:2344', 443, 0, 0))
(<AddressFamily.AF_INET6: 10>, <SocketKind.SOCK_STREAM: 1>, 6, '', ('1234:5555:6543:ff00:...:2102:d3a4', 443, 0, 0))
(<AddressFamily.AF_INET6: 10>, <SocketKind.SOCK_STREAM: 1>, 6, '', ('1234:5555:6543:ff50:...:d122:d3b4', 443, 0, 0))
(<AddressFamily.AF_INET: 2>, <SocketKind.SOCK_STREAM: 1>, 6, '', ('1.5.66.77', 443))
(<AddressFamily.AF_INET: 2>, <SocketKind.SOCK_STREAM: 1>, 6, '', ('2.5.65.77', 443))
(<AddressFamily.AF_INET: 2>, <SocketKind.SOCK_STREAM: 1>, 6, '', ('3.5.64.77', 443))
(<AddressFamily.AF_INET: 2>, <SocketKind.SOCK_STREAM: 1>, 6, '', ('4.5.63.77', 443))
it looks like the one that succeeds is the 6th, (<AddressFamily.AF_INET: 2>, <SocketKind.SOCK_STREAM: 1>, 6, '', ('2.5.65.77', 443))
.
transformers/file_utils
has at least one connexion initiation: get_from_cache (/path/to/condaenv/transformers/file_utils.py:1573)
if not local_files_only:
try:
r = requests.head(url, headers=headers, allow_redirects=False, proxies=proxies, timeout=etag_timeout)
etag_timeout
is 10, visibly 10 seconds, which may be a sensible default.
I have not looked at how many connexions attempts are made, but presumatly this is way more than the 6 * 10 seconds worth triggered by one get_from_cache
call.
I have not debugged running all this as root
, but presumably the first attempt at a socket connection just succeeds.
If I turn off the WiFi adapter…
…the list of socket address to try are in a different order, importantly IPv4 addresses are listed first:
(<AddressFamily.AF_INET: 2>, <SocketKind.SOCK_STREAM: 1>, 6, '', ('3.5.64.77', 443))
(<AddressFamily.AF_INET: 2>, <SocketKind.SOCK_STREAM: 1>, 6, '', ('2.5.65.77', 443))
(<AddressFamily.AF_INET: 2>, <SocketKind.SOCK_STREAM: 1>, 6, '', ('1.5.66.77', 443))
(<AddressFamily.AF_INET: 2>, <SocketKind.SOCK_STREAM: 1>, 6, '', ('4.5.63.77', 443))
(<AddressFamily.AF_INET6: 10>, <SocketKind.SOCK_STREAM: 1>, 6, '', ('1234:5555:6543:ff50:...:8142:2344', 443, 0, 0))
(<AddressFamily.AF_INET6: 10>, <SocketKind.SOCK_STREAM: 1>, 6, '', ('1234:5555:6543:ff00:...:b1f2:e3a4', 443, 0, 0))
(<AddressFamily.AF_INET6: 10>, <SocketKind.SOCK_STREAM: 1>, 6, '', ('1234:5555:6543:ff00:...:2102:d3a4', 443, 0, 0))
(<AddressFamily.AF_INET6: 10>, <SocketKind.SOCK_STREAM: 1>, 6, '', ('1234:5555:6543:ff50:...:d122:d3b4', 443, 0, 0))