I am trying to use AWS S3 option to load the hugging face transformer model GPT-NeoXT-Chat-Base-20B. The endpoint at SageMaker is successfully created.
predictor = huggingface_model.deploy(
initial_instance_count=1,
instance_type=âml.g5.12xlargeâ,
ModelDataDownloadTimeoutInSeconds = 2400,
ContainerStartupHealthCheckTimeoutInSeconds = 2400,
endpoint_name=endpoint_name
)
While calling the endpoint, getting the invocation timeout. By default, I guess its 1 min, how to increase the timeout interval as the prediction might take more than 1 min?
Error while calling predict function:
Code:
predictor.predict({âinputsâ: "Can you please let us know more details about your "})
Error :
File ~/anaconda3/envs/python3/lib/python3.10/site-packages/botocore/client.py:960, in BaseClient._make_api_call(self, operation_name, api_params)
958 error_code = parsed_response.get(âErrorâ, {}).get(âCodeâ)
959 error_class = self.exceptions.from_code(error_code)
â 960 raise error_class(parsed_response, operation_name)
961 else:
962 return parsed_response
ModelError: An error occurred (ModelError) when calling the InvokeEndpoint operation: Received server error (0) from primary with message âYour invocation timed out while waiting for a response from container primary. Review the latency metrics for each container in Amazon CloudWatch, resolve the issue, and try again.â. See https://us-west-2.console.aws.amazon.com/cloudwatch/home?region=us-west-2#logEventViewer:group=/aws/sagemaker/Endpoints/GPT-Neo-XT-Test-2023-03-27-05-32-22-539 in account 597748488783 for more information.
Please suggest any pointers to proceed.