Container build failed

I want to use the Flan-T5-xxl model on an inference endpoint, but it seems not to work.
First time I tried it was building for over an hour, then I just interrupted and deleted it.
Second time I tried and gave it more time and used the server in Ireland instead of US. At some point, approximately after 2 hours of waiting, I received an Error:
image

I do not know how to make this work…

The Flan-T5-small on a medium CPU worked well. It took something like 10 minutes to “build” I think before I could use it.
I have all my code ready and would love to let it run through my data. Any tips highly appreciated…

The container build failed again.
I am now using the fp16 version of the model, which only needs a medium size GPU. The building of this container also took over 2 hours. I checked every 30 minutes if it was ready and all of a sudden it was already running for more than one hour…

However it seems to work now. Problem is that it is way slower than expected…
The response times seem to be very unevenly distributed.
Response time of 0.5 seconds seems normal to me. But a response time of 80 seconds seems kind of out of order. At least if I understand the log right…

Is there a way to speed this up? How come that the times are so different from example to example. (My inputs are more or less equally long, one is maybe three times longer than the other…)