InvokeEndpoint Error : Predict function Invocation Timeout

I am trying to use AWS S3 option to load the hugging face transformer model GPT-NeoXT-Chat-Base-20B. The endpoint at SageMaker is successfully created.
predictor = huggingface_model.deploy(
ModelDataDownloadTimeoutInSeconds = 2400,
ContainerStartupHealthCheckTimeoutInSeconds = 2400,

While calling the endpoint, getting the invocation timeout. By default, I guess its 1 min, how to increase the timeout interval as the prediction might take more than 1 min?

Error while calling predict function:

predictor.predict({‘inputs’: "Can you please let us know more details about your "})

Error :
File ~/anaconda3/envs/python3/lib/python3.10/site-packages/botocore/, in BaseClient._make_api_call(self, operation_name, api_params)
958 error_code = parsed_response.get(“Error”, {}).get(“Code”)
959 error_class = self.exceptions.from_code(error_code)
→ 960 raise error_class(parsed_response, operation_name)
961 else:
962 return parsed_response

ModelError: An error occurred (ModelError) when calling the InvokeEndpoint operation: Received server error (0) from primary with message “Your invocation timed out while waiting for a response from container primary. Review the latency metrics for each container in Amazon CloudWatch, resolve the issue, and try again.”. See in account 597748488783 for more information.

Please suggest any pointers to proceed.

Having the same issue with all of the GPT models on AWS. BERT based models work just fine.

Update. I was able to get this to run properly. I wonder if it has to do with the instance size assigned to the predictor. I was taking the default ml.m5.xlarge at first and then tried larger machines. The size you see below finally allowed me to invoke the endpoint.

# deploy model to SageMaker Inference predictor = huggingface_model.deploy( initial_instance_count=1, # number of instances instance_type='ml.p3.2xlarge' # ec2 instance type )