I am using the HF text generation interface Docker container.
I can successfully run this model, e.g.:
docker run -p 8080:80 -v $volume:/data ghcr.io/huggingface/text-generation-inference:1.0.0 --model-id 'tiiuae/falcon-7b-instruct'
However, a question I have after running this model is the Features section notes:
Quantization with bitsandbytes and GPT-Q
If this is right, how do i make the bitsandbytes config settings?
Then I tried this model:
TheBloke/Llama-2-13B-chat-GPTQ
and get the error:
2023-08-06T20:01:24.114162319Z asyncio.run(serve_inner(model_id, revision, sharded, quantize, trust_remote_code))
2023-08-06T20:01:24.114167482Z
2023-08-06T20:01:24.114171555Z File "/opt/conda/lib/python3.9/asyncio/runners.py", line 44, in run
2023-08-06T20:01:24.114178322Z return loop.run_until_complete(main)
2023-08-06T20:01:24.114182856Z
2023-08-06T20:01:24.114186696Z File "/opt/conda/lib/python3.9/asyncio/base_events.py", line 647, in run_until_complete
2023-08-06T20:01:24.114191040Z return future.result()
2023-08-06T20:01:24.114195680Z
2023-08-06T20:01:24.114199773Z File "/opt/conda/lib/python3.9/site-packages/text_generation_server/server.py", line 124, in serve_inner
2023-08-06T20:01:24.114204097Z model = get_model(model_id, revision, sharded, quantize, trust_remote_code)
2023-08-06T20:01:24.114208384Z
2023-08-06T20:01:24.114212214Z File "/opt/conda/lib/python3.9/site-packages/text_generation_server/models/__init__.py", line 246, in get_model
2023-08-06T20:01:24.114235052Z return llama_cls(
2023-08-06T20:01:24.114239565Z
2023-08-06T20:01:24.114243359Z File "/opt/conda/lib/python3.9/site-packages/text_generation_server/models/flash_llama.py", line 58, in __init__
2023-08-06T20:01:24.114247573Z filenames = weight_files(model_id, revision, ".bin")
2023-08-06T20:01:24.114252653Z
2023-08-06T20:01:24.114256446Z File "/opt/conda/lib/python3.9/site-packages/text_generation_server/utils/hub.py", line 95, in weight_files
2023-08-06T20:01:24.114261056Z raise e
2023-08-06T20:01:24.114265563Z
2023-08-06T20:01:24.114269364Z File "/opt/conda/lib/python3.9/site-packages/text_generation_server/utils/hub.py", line 92, in weight_files
2023-08-06T20:01:24.114273610Z filenames = weight_hub_files(model_id, revision, extension)
2023-08-06T20:01:24.114277597Z
2023-08-06T20:01:24.114281344Z File "/opt/conda/lib/python3.9/site-packages/text_generation_server/utils/hub.py", line 33, in weight_hub_files
2023-08-06T20:01:24.114285511Z raise EntryNotFoundError(
2023-08-06T20:01:24.114289558Z
2023-08-06T20:01:24.114293478Z huggingface_hub.utils._errors.EntryNotFoundError: No .bin weights found for model TheB
I realize the overview says most popular LLMs, but there isn’t a list by actual model ids/names, rather broad categories. And while most likely not all models will not load, it is clear to me I do not understand what prevents the loading and how to fix it.
Advice needed and appreciated.
thank you.