@philschmid Sure thing!
I did a bunch of testing today with a number of different models. A number of them worked well with the new feature (e.g. eleuther/gpt-neox-20b, databricks/dolly-v2-12b, stabilityai/stablelm-tuned-alpha-7b).
On the other hand, I had a number of models that also failed to initialize in two modes. I wasn’t able to make much from the stack trace for these ones, so hopefully the feedback is helpful!
Method Prefill Error / Tensor Device Errors
google/flan-t5-xxl, google/flan-ul2, and google/ul2 all became stuck during initialization (on GPU-large) and repeated errors until the endpoint was deleted.
The first set of errors was:
{"timestamp":"2023-05-30T18:56:14.642641Z",
"level":"ERROR",
"fields":{"message":"Method Prefill encountered an error.\nTraceback (most recent call last):\n File \"/opt/conda/bin/text-generation-server\", line 8, in <etc>...
Next, it would emit 4 errors similar to below, each with a unique cuda device number (0 through 3).
{"timestamp":"2023-05-30T18:56:14.643039Z",
"level":"ERROR",
"message":"Server error: Expected all tensors to be on the same device, but found at least two devices, cpu and cuda:3! (when checking argument for argument index in method wrapper_CUDA__index_select)",
"target":"text_generation_client",
"filename":"router/client/src/lib.rs",
"line_number":33,
"span":{"id":18446744073709551615,
"size":1,"name":"prefill"},
"spans":[
{"http.client_ip":"","http.flavor":"1.1","http.host":"10.41.30.199:80","http.method":"GET","http.route":"/health","http.scheme":"HTTP","http.target":"/health","http.user_agent":"kube-probe/1.22+","otel.kind":"server","otel.name":"GET /health","trace_id":"b1ebcdf576f2eaf5a144d43ebc23ad12","name":"HTTP request"},
{"name":"health"},
{"id":18446744073709551615,"size":1,"name":"prefill"},
{"id":18446744073709551615,"size":1,"name":"prefill"}
]}
After this it would continue to repeat the Method Profile and then Server error.
Error: ShardCannotStart
When running the models cerebras/Cerebras-GPT-6.7B and cerebras/Cerebras-GPT-13B, I received a ShardCannotStart
Error. In the log, this was preceded by an error being raised:
ValueError: sharded is not supported for AutoModel
If it is of any help, I also downloaded the full log for the initialization. I wasn’t quite sure how to attach them here!