Hi,
I am deploying gte-large-en-v1.5 l to sagemaker via the sagemaker.huggingface.HuggingFaceModel.deploy
method.
When requesting inference I get the following error:
An error occurred (ModelError) when calling the InvokeEndpoint operation: Received client error (400) from primary with message "{
"code": 400,
"type": "InternalServerException",
"message": "Loading /.sagemaker/mms/models/Alibaba-NLP__gte-large-en-v1.5 requires you to execute the configuration file in that repo on your local machine. Make sure you have read the code there to avoid malicious use, then set the option `trust_remote_code\u003dTrue` to remove this error."
}
This is my Hub env var dictionary for deployment:
# Hub Model configuration. <https://huggingface.co/models>
hub = {
'HF_MODEL_ID':'Alibaba-NLP/gte-large-en-v1.5', # model_id from hf.co/models
'HF_TASK':'feature-extraction',
'MMS_MAX_REQUEST_SIZE': json.dumps(200000000),
'MMS_MAX_RESPONSE_SIZE': json.dumps(200000000),
'TRUST_REMOTE_CODE': json.dumps(True) # https://github.com/huggingface/text-generation-inference/issues/493
}
I have also tried using HF_TRUST_REMOTE_CODE
, it has the same result. How can I set this bool to true in my Sagemaker Endpoint Env?
Thanks.
My total deplay code is here:
# Hub Model configuration. <https://huggingface.co/models>
hub = {
'HF_MODEL_ID':'Alibaba-NLP/gte-large-en-v1.5', # model_id from hf.co/models
'HF_TASK':'feature-extraction', # NLP task you want to use for predictions
'MMS_MAX_REQUEST_SIZE': json.dumps(200000000),
'MMS_MAX_RESPONSE_SIZE': json.dumps(200000000),
'TRUST_REMOTE_CODE': json.dumps(True) # https://github.com/huggingface/text-generation-inference/issues/493
# 'MAX_INPUT_LENGTH': json.dumps(3000) # https://github.com/huggingface/text-embeddings-inference/issues/141
# 'SM_NUM_GPUS': '1',
}
from sagemaker.huggingface import HuggingFaceModel
huggingface_model = HuggingFaceModel(
env=hub, # configuration for loading model from Hub
role=role, # iam role with permissions to create an Endpoint
py_version='py310',
transformers_version="4.37.0", # transformers version used
pytorch_version="2.1.0", # pytorch version used
)
# deploy model to SageMaker Inference
predictor = huggingface_model.deploy(
initial_instance_count=1,
instance_type="ml.g4dn.xlarge",
endpoint_name = ENDPOINT_NAME,
)