huggingface_hub.utils._errors.EntryNotFoundError: No .bin weights found for model

I am using the HF text generation interface Docker container.

I can successfully run this model, e.g.:

docker run  -p 8080:80 -v $volume:/data ghcr.io/huggingface/text-generation-inference:1.0.0 --model-id 'tiiuae/falcon-7b-instruct'

However, a question I have after running this model is the Features section notes:

Quantization with bitsandbytes and GPT-Q

If this is right, how do i make the bitsandbytes config settings?

Then I tried this model:

TheBloke/Llama-2-13B-chat-GPTQ

and get the error:

2023-08-06T20:01:24.114162319Z     asyncio.run(serve_inner(model_id, revision, sharded, quantize, trust_remote_code))
2023-08-06T20:01:24.114167482Z 
2023-08-06T20:01:24.114171555Z   File "/opt/conda/lib/python3.9/asyncio/runners.py", line 44, in run
2023-08-06T20:01:24.114178322Z     return loop.run_until_complete(main)
2023-08-06T20:01:24.114182856Z 
2023-08-06T20:01:24.114186696Z   File "/opt/conda/lib/python3.9/asyncio/base_events.py", line 647, in run_until_complete
2023-08-06T20:01:24.114191040Z     return future.result()
2023-08-06T20:01:24.114195680Z 
2023-08-06T20:01:24.114199773Z   File "/opt/conda/lib/python3.9/site-packages/text_generation_server/server.py", line 124, in serve_inner
2023-08-06T20:01:24.114204097Z     model = get_model(model_id, revision, sharded, quantize, trust_remote_code)
2023-08-06T20:01:24.114208384Z 
2023-08-06T20:01:24.114212214Z   File "/opt/conda/lib/python3.9/site-packages/text_generation_server/models/__init__.py", line 246, in get_model
2023-08-06T20:01:24.114235052Z     return llama_cls(
2023-08-06T20:01:24.114239565Z 
2023-08-06T20:01:24.114243359Z   File "/opt/conda/lib/python3.9/site-packages/text_generation_server/models/flash_llama.py", line 58, in __init__
2023-08-06T20:01:24.114247573Z     filenames = weight_files(model_id, revision, ".bin")
2023-08-06T20:01:24.114252653Z 
2023-08-06T20:01:24.114256446Z   File "/opt/conda/lib/python3.9/site-packages/text_generation_server/utils/hub.py", line 95, in weight_files
2023-08-06T20:01:24.114261056Z     raise e
2023-08-06T20:01:24.114265563Z 
2023-08-06T20:01:24.114269364Z   File "/opt/conda/lib/python3.9/site-packages/text_generation_server/utils/hub.py", line 92, in weight_files
2023-08-06T20:01:24.114273610Z     filenames = weight_hub_files(model_id, revision, extension)
2023-08-06T20:01:24.114277597Z 
2023-08-06T20:01:24.114281344Z   File "/opt/conda/lib/python3.9/site-packages/text_generation_server/utils/hub.py", line 33, in weight_hub_files
2023-08-06T20:01:24.114285511Z     raise EntryNotFoundError(
2023-08-06T20:01:24.114289558Z 
2023-08-06T20:01:24.114293478Z huggingface_hub.utils._errors.EntryNotFoundError: No .bin weights found for model TheB

I realize the overview says most popular LLMs, but there isn’t a list by actual model ids/names, rather broad categories. And while most likely not all models will not load, it is clear to me I do not understand what prevents the loading and how to fix it.

Advice needed and appreciated.
thank you.