I’m trying to run the Docker container but encountering the error You are using a model of type llama to instantiate a model of type . This is not supported for all configurations of models and can yield errors. during the “Warming up model” part and after allocating 11GB of GPU memory.
I tried adding -e GPTQ_BITS=4 -e GPTQ_GROUPSIZE=128 but that made no difference. I also did my research and read almost every issue on the repo but I don’t think this problem has been encountered before.
Also tried adding DISABLE_EXLLAMA=True and --disable-custom-kernels
Docker Command
sudo docker run --gpus all --shm-size 16g -p 8183:80 \
--runtime=nvidia \
-v /storage/text-generation-inference/docker/data:/data ghcr.io/huggingface/text-generation-inference:latest \
--model-id TheBloke/MythoMax-L2-13B-GPTQ --revision gptq-4bit-128g-actorder_True \
--quantize gptq
Version: ghcr.io/huggingface/text-generation-inference:latest
Model: TheBloke/MythoMax-L2-13B-GPTQ:gptq-4bit-128g-actorder_True
GPU: Nvidia A4000
Ubuntu 22.04
Logs
Command Ouput
2023-08-26T23:55:35.854791Z INFO text_generation_launcher: Args { model_id: "TheBloke/MythoMax-L2-13B-GPTQ", revision: Some("gptq-4bit-128g-actorder_True"), validation_workers: 2, sharded: None, num_shard: Some(1), quantize: Some(Gptq), dtype: None, trust_remote_code: false, max_concurrent_requests: 128, max_best_of: 2, max_stop_sequences: 4, max_input_length: 1024, max_total_tokens: 2048, waiting_served_ratio: 1.2, max_batch_prefill_tokens: 4096, max_batch_total_tokens: None, max_waiting_tokens: 20, hostname: "9f8054981ef0", port: 80, shard_uds_path: "/tmp/text-generation-server", master_addr: "localhost", master_port: 29500, huggingface_hub_cache: Some("/data"), weights_cache_override: None, disable_custom_kernels: false, cuda_memory_fraction: 1.0, rope_scaling: None, rope_factor: None, json_output: false, otlp_endpoint: None, cors_allow_origin: [], watermark_gamma: None, watermark_delta: None, ngrok: false, ngrok_authtoken: None, ngrok_edge: None, env: false }
2023-08-26T23:55:35.855120Z INFO download: text_generation_launcher: Starting download process.
2023-08-26T23:55:42.115911Z INFO text_generation_launcher: Files are already present on the host. Skipping download.
2023-08-26T23:55:42.966478Z INFO download: text_generation_launcher: Successfully downloaded weights.
2023-08-26T23:55:42.967019Z INFO shard-manager: text_generation_launcher: Starting shard rank=0
2023-08-26T23:55:50.341447Z INFO text_generation_launcher: Using exllama kernels
2023-08-26T23:55:50.506312Z INFO text_generation_launcher: Using exllama kernels
2023-08-26T23:55:50.558788Z INFO text_generation_launcher: Using exllama kernels
2023-08-26T23:55:50.631771Z INFO text_generation_launcher: Using exllama kernels
2023-08-26T23:55:50.684478Z INFO text_generation_launcher: Using exllama kernels
2023-08-26T23:55:50.755956Z INFO text_generation_launcher: Using exllama kernels
2023-08-26T23:55:50.808272Z INFO text_generation_launcher: Using exllama kernels
2023-08-26T23:55:50.878371Z INFO text_generation_launcher: Using exllama kernels
2023-08-26T23:55:50.930356Z INFO text_generation_launcher: Using exllama kernels
2023-08-26T23:55:50.999522Z INFO text_generation_launcher: Using exllama kernels
2023-08-26T23:55:51.058761Z INFO text_generation_launcher: Using exllama kernels
2023-08-26T23:55:51.140228Z INFO text_generation_launcher: Using exllama kernels
2023-08-26T23:55:51.197339Z INFO text_generation_launcher: Using exllama kernels
2023-08-26T23:55:51.277385Z INFO text_generation_launcher: Using exllama kernels
2023-08-26T23:55:51.333274Z INFO text_generation_launcher: Using exllama kernels
2023-08-26T23:55:51.412191Z INFO text_generation_launcher: Using exllama kernels
2023-08-26T23:55:51.468466Z INFO text_generation_launcher: Using exllama kernels
2023-08-26T23:55:51.544203Z INFO text_generation_launcher: Using exllama kernels
2023-08-26T23:55:51.598740Z INFO text_generation_launcher: Using exllama kernels
2023-08-26T23:55:51.673087Z INFO text_generation_launcher: Using exllama kernels
2023-08-26T23:55:51.738228Z INFO text_generation_launcher: Using exllama kernels
2023-08-26T23:55:51.829787Z INFO text_generation_launcher: Using exllama kernels
2023-08-26T23:55:51.899165Z INFO text_generation_launcher: Using exllama kernels
2023-08-26T23:55:51.986018Z INFO text_generation_launcher: Using exllama kernels
2023-08-26T23:55:52.053955Z INFO text_generation_launcher: Using exllama kernels
2023-08-26T23:55:52.140754Z INFO text_generation_launcher: Using exllama kernels
2023-08-26T23:55:52.201983Z INFO text_generation_launcher: Using exllama kernels
2023-08-26T23:55:52.284596Z INFO text_generation_launcher: Using exllama kernels
2023-08-26T23:55:52.347910Z INFO text_generation_launcher: Using exllama kernels
2023-08-26T23:55:52.429569Z INFO text_generation_launcher: Using exllama kernels
2023-08-26T23:55:52.495493Z INFO text_generation_launcher: Using exllama kernels
2023-08-26T23:55:52.580749Z INFO text_generation_launcher: Using exllama kernels
2023-08-26T23:55:52.649547Z INFO text_generation_launcher: Using exllama kernels
2023-08-26T23:55:52.735419Z INFO text_generation_launcher: Using exllama kernels
2023-08-26T23:55:52.798945Z INFO text_generation_launcher: Using exllama kernels
2023-08-26T23:55:52.881939Z INFO text_generation_launcher: Using exllama kernels
2023-08-26T23:55:52.948323Z INFO text_generation_launcher: Using exllama kernels
2023-08-26T23:55:52.991872Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0
2023-08-26T23:55:53.039310Z INFO text_generation_launcher: Using exllama kernels
2023-08-26T23:55:53.111628Z INFO text_generation_launcher: Using exllama kernels
2023-08-26T23:55:53.199440Z INFO text_generation_launcher: Using exllama kernels
2023-08-26T23:55:53.265831Z INFO text_generation_launcher: Using exllama kernels
2023-08-26T23:55:53.352812Z INFO text_generation_launcher: Using exllama kernels
2023-08-26T23:55:53.421639Z INFO text_generation_launcher: Using exllama kernels
2023-08-26T23:55:53.507330Z INFO text_generation_launcher: Using exllama kernels
2023-08-26T23:55:53.576358Z INFO text_generation_launcher: Using exllama kernels
2023-08-26T23:55:53.663294Z INFO text_generation_launcher: Using exllama kernels
2023-08-26T23:55:53.729889Z INFO text_generation_launcher: Using exllama kernels
2023-08-26T23:55:53.817077Z INFO text_generation_launcher: Using exllama kernels
2023-08-26T23:55:53.887240Z INFO text_generation_launcher: Using exllama kernels
2023-08-26T23:55:53.971477Z INFO text_generation_launcher: Using exllama kernels
2023-08-26T23:55:54.031414Z INFO text_generation_launcher: Using exllama kernels
2023-08-26T23:55:54.110964Z INFO text_generation_launcher: Using exllama kernels
2023-08-26T23:55:54.169305Z INFO text_generation_launcher: Using exllama kernels
2023-08-26T23:55:54.243262Z INFO text_generation_launcher: Using exllama kernels
2023-08-26T23:55:54.300053Z INFO text_generation_launcher: Using exllama kernels
2023-08-26T23:55:54.390297Z INFO text_generation_launcher: Using exllama kernels
2023-08-26T23:55:54.449249Z INFO text_generation_launcher: Using exllama kernels
2023-08-26T23:55:54.522054Z INFO text_generation_launcher: Using exllama kernels
2023-08-26T23:55:54.578900Z INFO text_generation_launcher: Using exllama kernels
2023-08-26T23:55:54.653837Z INFO text_generation_launcher: Using exllama kernels
2023-08-26T23:55:54.708822Z INFO text_generation_launcher: Using exllama kernels
2023-08-26T23:55:54.782192Z INFO text_generation_launcher: Using exllama kernels
2023-08-26T23:55:54.836983Z INFO text_generation_launcher: Using exllama kernels
2023-08-26T23:55:54.909894Z INFO text_generation_launcher: Using exllama kernels
2023-08-26T23:55:54.966976Z INFO text_generation_launcher: Using exllama kernels
2023-08-26T23:55:55.040961Z INFO text_generation_launcher: Using exllama kernels
2023-08-26T23:55:55.098204Z INFO text_generation_launcher: Using exllama kernels
2023-08-26T23:55:55.171750Z INFO text_generation_launcher: Using exllama kernels
2023-08-26T23:55:55.230459Z INFO text_generation_launcher: Using exllama kernels
2023-08-26T23:55:55.308081Z INFO text_generation_launcher: Using exllama kernels
2023-08-26T23:55:55.366400Z INFO text_generation_launcher: Using exllama kernels
2023-08-26T23:55:55.442389Z INFO text_generation_launcher: Using exllama kernels
2023-08-26T23:55:55.500467Z INFO text_generation_launcher: Using exllama kernels
2023-08-26T23:55:55.575549Z INFO text_generation_launcher: Using exllama kernels
2023-08-26T23:55:55.633376Z INFO text_generation_launcher: Using exllama kernels
2023-08-26T23:55:55.712977Z INFO text_generation_launcher: Using exllama kernels
2023-08-26T23:55:55.769988Z INFO text_generation_launcher: Using exllama kernels
2023-08-26T23:55:55.847992Z INFO text_generation_launcher: Using exllama kernels
2023-08-26T23:55:55.906371Z INFO text_generation_launcher: Using exllama kernels
2023-08-26T23:55:55.980776Z INFO text_generation_launcher: Using exllama kernels
2023-08-26T23:55:57.879732Z INFO text_generation_launcher: Server started at unix:///tmp/text-generation-server-0
2023-08-26T23:55:57.900250Z INFO shard-manager: text_generation_launcher: Shard ready in 14.931675025s rank=0
2023-08-26T23:55:57.987600Z INFO text_generation_launcher: Starting Webserver
2023-08-26T23:55:59.443055Z INFO text_generation_router: router/src/main.rs:367: Serving revision 48caaecfc0ca83f44d056a3f46d4c1661bbea1e7 of model TheBloke/MythoMax-L2-13B-GPTQ
2023-08-26T23:55:59.452758Z INFO text_generation_router: router/src/main.rs:210: Warming up model
2023-08-26T23:56:11.299165Z ERROR warmup{max_input_length=1024 max_prefill_tokens=4096}:warmup: text_generation_client: router/client/src/lib.rs:33: Server error: transport error
Error: Warmup(Generation("transport error"))
2023-08-26T23:56:11.411063Z ERROR text_generation_launcher: Webserver Crashed
2023-08-26T23:56:11.411132Z INFO text_generation_launcher: Shutting down shards
2023-08-26T23:56:11.424793Z ERROR shard-manager: text_generation_launcher: Shard complete standard error output:
You are using a model of type llama to instantiate a model of type . This is not supported for all configurations of models and can yield errors. rank=0
2023-08-26T23:56:11.424860Z ERROR shard-manager: text_generation_launcher: Shard process was signaled to shutdown with signal 4 rank=0
Sometimes I don’t get the Shard complete standard error output message.