I am trying to use AWS S3 option to load the hugging face transformer model GPT-NeoXT-Chat-Base-20B. The endpoint at SageMaker is successfully created.
predictor = huggingface_model.deploy(
initial_instance_count=1,
instance_type=āml.g5.12xlargeā,
ModelDataDownloadTimeoutInSeconds = 2400,
ContainerStartupHealthCheckTimeoutInSeconds = 2400,
endpoint_name=endpoint_name
)
While calling the endpoint, getting the invocation timeout. By default, I guess its 1 min, how to increase the timeout interval as the prediction might take more than 1 min?
Error while calling predict function:
Code:
predictor.predict({āinputsā: "Can you please let us know more details about your "})
Update. I was able to get this to run properly. I wonder if it has to do with the instance size assigned to the predictor. I was taking the default ml.m5.xlarge at first and then tried larger machines. The size you see below finally allowed me to invoke the endpoint.
# deploy model to SageMaker Inference predictor = huggingface_model.deploy( initial_instance_count=1, # number of instances instance_type='ml.p3.2xlarge' # ec2 instance type )
Iām using a ml.g5.12xlarge for my LLM which is the same size that runs other foundation LLM models of the same param size (ie JumpStart verisions) and Iām still getting the timeout error. So I donāt think itās just the size of the instance
(Using a fine-tuned falcon-7b model deployed to a ml.g5.12xlarge which is the same size compute i was able to fine-tune the model on)
Iām guessing it might have to do with how inference is called. For instance, in this SageMaker Pipelines example during āDefine a Register Model Stepā¦ā we just pass an image to use for inference and set image_scope=āinferenceā and the Model object the model data (but no inference script itself). Later during āDeploy latest approved model to a real-time endpointā we grab the Approved model_package_arn and deploy the model to an endpoint with model.deploy(initial_instance_count=x, instance_type="compute type", endpoint_name="endpoint name") but I still never see where we tell the endpoint or registered Model how to inference. Is there a black box evaluate/predict method that sagemaker is defaulted to which just doesnāt work yet for HuggingFace LLM type models? Investigatingā¦