huggingface_hub.utils._errors.EntryNotFoundError: No .bin weights found for model

happyday1 · August 6, 2023, 8:18pm

I am using the HF text generation interface Docker container.

I can successfully run this model, e.g.:

docker run  -p 8080:80 -v $volume:/data ghcr.io/huggingface/text-generation-inference:1.0.0 --model-id 'tiiuae/falcon-7b-instruct'

However, a question I have after running this model is the Features section notes:

Quantization with bitsandbytes and GPT-Q

If this is right, how do i make the bitsandbytes config settings?

Then I tried this model:

TheBloke/Llama-2-13B-chat-GPTQ

and get the error:

2023-08-06T20:01:24.114162319Z     asyncio.run(serve_inner(model_id, revision, sharded, quantize, trust_remote_code))
2023-08-06T20:01:24.114167482Z 
2023-08-06T20:01:24.114171555Z   File "/opt/conda/lib/python3.9/asyncio/runners.py", line 44, in run
2023-08-06T20:01:24.114178322Z     return loop.run_until_complete(main)
2023-08-06T20:01:24.114182856Z 
2023-08-06T20:01:24.114186696Z   File "/opt/conda/lib/python3.9/asyncio/base_events.py", line 647, in run_until_complete
2023-08-06T20:01:24.114191040Z     return future.result()
2023-08-06T20:01:24.114195680Z 
2023-08-06T20:01:24.114199773Z   File "/opt/conda/lib/python3.9/site-packages/text_generation_server/server.py", line 124, in serve_inner
2023-08-06T20:01:24.114204097Z     model = get_model(model_id, revision, sharded, quantize, trust_remote_code)
2023-08-06T20:01:24.114208384Z 
2023-08-06T20:01:24.114212214Z   File "/opt/conda/lib/python3.9/site-packages/text_generation_server/models/__init__.py", line 246, in get_model
2023-08-06T20:01:24.114235052Z     return llama_cls(
2023-08-06T20:01:24.114239565Z 
2023-08-06T20:01:24.114243359Z   File "/opt/conda/lib/python3.9/site-packages/text_generation_server/models/flash_llama.py", line 58, in __init__
2023-08-06T20:01:24.114247573Z     filenames = weight_files(model_id, revision, ".bin")
2023-08-06T20:01:24.114252653Z 
2023-08-06T20:01:24.114256446Z   File "/opt/conda/lib/python3.9/site-packages/text_generation_server/utils/hub.py", line 95, in weight_files
2023-08-06T20:01:24.114261056Z     raise e
2023-08-06T20:01:24.114265563Z 
2023-08-06T20:01:24.114269364Z   File "/opt/conda/lib/python3.9/site-packages/text_generation_server/utils/hub.py", line 92, in weight_files
2023-08-06T20:01:24.114273610Z     filenames = weight_hub_files(model_id, revision, extension)
2023-08-06T20:01:24.114277597Z 
2023-08-06T20:01:24.114281344Z   File "/opt/conda/lib/python3.9/site-packages/text_generation_server/utils/hub.py", line 33, in weight_hub_files
2023-08-06T20:01:24.114285511Z     raise EntryNotFoundError(
2023-08-06T20:01:24.114289558Z 
2023-08-06T20:01:24.114293478Z huggingface_hub.utils._errors.EntryNotFoundError: No .bin weights found for model TheB

I realize the overview says most popular LLMs, but there isn’t a list by actual model ids/names, rather broad categories. And while most likely not all models will not load, it is clear to me I do not understand what prevents the loading and how to fix it.

Advice needed and appreciated.
thank you.

Topic		Replies	Views
Error loading tokenizer: data did not match any variant of untagged enum ModelWrapper at line 1251003 column 3 🤗Tokenizers	3	3564	October 10, 2024
codellama/CodeLlama-70b-Instruct-hf TGI server out-of-memory error in H100 Models	2	286	March 22, 2024
Error Debugging Beginners	1	21	April 29, 2025
Errors running Inference Endpoint with quantized model Inference Endpoints on the Hub	2	791	September 14, 2023
Text-generation-inference: "You are using a model of type llama to instantiate a model of type ." Models	5	7429	November 3, 2023

huggingface_hub.utils._errors.EntryNotFoundError: No .bin weights found for model

Related topics