Container build failed

hebkon · February 20, 2023, 7:54pm

I want to use the Flan-T5-xxl model on an inference endpoint, but it seems not to work.
First time I tried it was building for over an hour, then I just interrupted and deleted it.
Second time I tried and gave it more time and used the server in Ireland instead of US. At some point, approximately after 2 hours of waiting, I received an Error:

I do not know how to make this work…

The Flan-T5-small on a medium CPU worked well. It took something like 10 minutes to “build” I think before I could use it.
I have all my code ready and would love to let it run through my data. Any tips highly appreciated…

hebkon · February 21, 2023, 2:16pm

The container build failed again.
I am now using the fp16 version of the model, which only needs a medium size GPU. The building of this container also took over 2 hours. I checked every 30 minutes if it was ready and all of a sudden it was already running for more than one hour…

However it seems to work now. Problem is that it is way slower than expected…
The response times seem to be very unevenly distributed.
Response time of 0.5 seconds seems normal to me. But a response time of 80 seconds seems kind of out of order. At least if I understand the log right…

Is there a way to speed this up? How come that the times are so different from example to example. (My inputs are more or less equally long, one is maybe three times longer than the other…)

Topic		Replies	Views
Inference API model timeout (Flan-UL2) Inference Endpoints on the Hub	1	885	May 26, 2023
Impossible to use flan-t5-xxl in a batch-transform job Amazon SageMaker	3	1148	May 23, 2023
Scheduling failure: unable to schedule Inference Endpoints on the Hub	5	23	June 27, 2025
502 Bad Gateway Error for Flan-UL2 model Inference Endpoints on the Hub	2	555	June 27, 2023
Optimizations and cloud instance characteristics for Flan-T5 real-time inference 🤗Transformers	0	547	February 7, 2023

Container build failed

Related topics