When trying to deploy a model on sagemaker for an EncoderDecoderModel
, the predictor throws an error that says it can’t find the config.json
.
The model is created as such:
from transformers import PreTrainedTokenizerFast
from transformers import EncoderDecoderModel
from transformers import pipeline
tokenizer = PreTrainedTokenizerFast.from_pretrained("bert-base-multilingual-uncased")
tokenizer.bos_token = tokenizer.cls_token
tokenizer.eos_token = tokenizer.sep_token
tokenizer.add_special_tokens({'pad_token': '[PAD]'})
multibert = EncoderDecoderModel.from_encoder_decoder_pretrained(
"bert-base-multilingual-uncased", "bert-base-multilingual-uncased"
)
# set special tokens
multibert.config.decoder_start_token_id = tokenizer.bos_token_id
multibert.config.eos_token_id = tokenizer.eos_token_id
multibert.config.pad_token_id = tokenizer.pad_token_id
m = pipeline("translation", model=multibert, tokenizer=berttokenizer)
m.save_pretrained('test-model')
Note: When the model is not properly trained, it outputs poor translations but it’s a valid model object.
Then I’ve compressed the model and push it up to S3 like this:
! tar -cvzf test-model.tar.gz test-model
! aws s3 cp test-model.tar.gz s3://mybucket/test-model.tar.gz
And deployed the model like this:
from sagemaker.huggingface import HuggingFaceModel
import sagemaker
import boto3
client = boto3.client('sts')
account = client.get_caller_identity()['Account']
sess = boto3.session.Session()
role = sagemaker.get_execution_role()
ecr_uri = '123456789000.dkr.ecr.us-east-2.amazonaws.com/huggingface-pytorch-inference-custom'
hub = {
'HF_TASK':'translation',
'SAGEMAKER_CONTAINER_LOG_LEVEL': 10
}
huggingface_model = HuggingFaceModel(
model_data="s3://mybucket/test-model.tar.gz",
image_uri=ecr_uri,
env=hub,
role=role,
)
predictor = huggingface_model.deploy(
initial_instance_count=1,
instance_type="ml.g4dn.xlarge"
)
Note: The custom ECR image is just an extension of the canonical ones supported on deep-learning-containers/available_images.md at master · aws/deep-learning-containers · GitHub. I’ve tried to deploy other of-the-shelves models and it works out of the box, e.g. Helsinki-NLP/opus-mt-de-en · Hugging Face
The deployment looks successful but when trying the predict, e.g.
predictor.predict(["hello world"])
It throws the error:
---------------------------------------------------------------------------
ModelError Traceback (most recent call last)
<ipython-input-20-c5d2aedef5cb> in <module>
----> 1 predictor.predict(["hello world"])
/opt/conda/lib/python3.8/site-packages/sagemaker/predictor.py in predict(self, data, initial_args, target_model, target_variant, inference_id)
159 data, initial_args, target_model, target_variant, inference_id
160 )
--> 161 response = self.sagemaker_session.sagemaker_runtime_client.invoke_endpoint(**request_args)
162 return self._handle_response(response)
163
/opt/conda/lib/python3.8/site-packages/botocore/client.py in _api_call(self, *args, **kwargs)
512 )
513 # The "self" in this scope is referring to the BaseClient.
--> 514 return self._make_api_call(operation_name, kwargs)
515
516 _api_call.__name__ = str(py_operation_name)
/opt/conda/lib/python3.8/site-packages/botocore/client.py in _make_api_call(self, operation_name, api_params)
936 error_code = parsed_response.get("Error", {}).get("Code")
937 error_class = self.exceptions.from_code(error_code)
--> 938 raise error_class(parsed_response, operation_name)
939 else:
940 return parsed_response
ModelError: An error occurred (ModelError) when calling the InvokeEndpoint operation: Received client error (400) from primary with message "{
"code": 400,
"type": "InternalServerException",
"message": "/.sagemaker/mms/models/model does not appear to have a file named config.json. Checkout \u0027https://huggingface.co//.sagemaker/mms/models/model/None\u0027 for available files."
}