Auo-replicas is not working

I am deploying blip2-flan-t5-xl [Salesforce/blip2-flan-t5-xl · Hugging Face] model on aws Nvidia A10G. I have some more issues in triggering the auto-scaling. The documentation says that the new replicas are created when the gpu usage remains above 80% for over 2 minutes.

But my experiments have shown that when GPU usage nears the 80%, it starts rejecting the requests, so no replicas is created. I have attached the screenshots below to assist my query. As it can be seen that when gpu usage nears the 80%, server based errors have increased, so eventually the gpu usage comes down the 80%, so no replicas are created.

Does anybody have an idea?