Sagemaker/mms/models/model does not appear to have a file named config.json

When trying to deploy a model on sagemaker for an EncoderDecoderModel, the predictor throws an error that says it can’t find the config.json.

The model is created as such:

from transformers import PreTrainedTokenizerFast
from transformers import EncoderDecoderModel
from transformers import pipeline

tokenizer = PreTrainedTokenizerFast.from_pretrained("bert-base-multilingual-uncased")

tokenizer.bos_token = tokenizer.cls_token
tokenizer.eos_token = tokenizer.sep_token
tokenizer.add_special_tokens({'pad_token': '[PAD]'})

multibert = EncoderDecoderModel.from_encoder_decoder_pretrained(
    "bert-base-multilingual-uncased", "bert-base-multilingual-uncased"
)

# set special tokens
multibert.config.decoder_start_token_id = tokenizer.bos_token_id
multibert.config.eos_token_id = tokenizer.eos_token_id
multibert.config.pad_token_id = tokenizer.pad_token_id


m = pipeline("translation", model=multibert, tokenizer=berttokenizer)
m.save_pretrained('test-model')

Note: When the model is not properly trained, it outputs poor translations but it’s a valid model object.

Then I’ve compressed the model and push it up to S3 like this:

! tar -cvzf test-model.tar.gz test-model
! aws s3 cp test-model.tar.gz s3://mybucket/test-model.tar.gz

And deployed the model like this:

from sagemaker.huggingface import HuggingFaceModel
import sagemaker

import boto3

client = boto3.client('sts')
account = client.get_caller_identity()['Account']
sess = boto3.session.Session()

role = sagemaker.get_execution_role()

ecr_uri = '123456789000.dkr.ecr.us-east-2.amazonaws.com/huggingface-pytorch-inference-custom'

hub = {
    'HF_TASK':'translation',
    'SAGEMAKER_CONTAINER_LOG_LEVEL': 10
}

huggingface_model = HuggingFaceModel(
    model_data="s3://mybucket/test-model.tar.gz",
    image_uri=ecr_uri,
    env=hub,
    role=role,
)

predictor = huggingface_model.deploy(
    initial_instance_count=1,
    instance_type="ml.g4dn.xlarge"
)

Note: The custom ECR image is just an extension of the canonical ones supported on deep-learning-containers/available_images.md at master · aws/deep-learning-containers · GitHub. I’ve tried to deploy other of-the-shelves models and it works out of the box, e.g. Helsinki-NLP/opus-mt-de-en · Hugging Face

The deployment looks successful but when trying the predict, e.g.

predictor.predict(["hello world"])

It throws the error:

---------------------------------------------------------------------------
ModelError                                Traceback (most recent call last)
<ipython-input-20-c5d2aedef5cb> in <module>
----> 1 predictor.predict(["hello world"])

/opt/conda/lib/python3.8/site-packages/sagemaker/predictor.py in predict(self, data, initial_args, target_model, target_variant, inference_id)
    159             data, initial_args, target_model, target_variant, inference_id
    160         )
--> 161         response = self.sagemaker_session.sagemaker_runtime_client.invoke_endpoint(**request_args)
    162         return self._handle_response(response)
    163 

/opt/conda/lib/python3.8/site-packages/botocore/client.py in _api_call(self, *args, **kwargs)
    512                 )
    513             # The "self" in this scope is referring to the BaseClient.
--> 514             return self._make_api_call(operation_name, kwargs)
    515 
    516         _api_call.__name__ = str(py_operation_name)

/opt/conda/lib/python3.8/site-packages/botocore/client.py in _make_api_call(self, operation_name, api_params)
    936             error_code = parsed_response.get("Error", {}).get("Code")
    937             error_class = self.exceptions.from_code(error_code)
--> 938             raise error_class(parsed_response, operation_name)
    939         else:
    940             return parsed_response

ModelError: An error occurred (ModelError) when calling the InvokeEndpoint operation: Received client error (400) from primary with message "{
  "code": 400,
  "type": "InternalServerException",
  "message": "/.sagemaker/mms/models/model does not appear to have a file named config.json. Checkout \u0027https://huggingface.co//.sagemaker/mms/models/model/None\u0027 for available files."
}

Q: Are there additional config that I need to do after m = pipeline("translation", model=multibert, tokenizer=berttokenizer); m.save_pretrained('test-model')?

Q: Are there examples of deploying EncoderDecoderModel with different pipeline tasks?

Hi @alvations

I see a few potential issues in your code, but before I list them - what are you eventually trying to achieve? What is the job of the endpoint going to be? Just trying to get more context in case there are other ways to tackle the underlying problem.

In any case, here are some of my observations re your code:

  • The first block of code won’t run because you don’t define berttokenizer
  • Replacing berttokenizer with tokenizer (which I assume was your intention) I can set up the pipeline. But when running the pipeline with a sample text I get an error ValueError: 'decoder_start_token_id' or 'bos_token_id' has to be defined for encoder-decoder generation.
  • When tarballing the model directory I don’t think you’re supposed to include the root directory (test-model in your case). The inference script will look in mms/models/model/ for the config file, but I beleieve your config file will end up in mms/models/model/test-model/

Hope that helps!

Cheers
Heiko

Thanks Heiko for the response. Sorry had to be away for a while. Apologies for the non-working example in the previous comment.

The goal is to try to save a model and load it such that I can deploy it in sagemaker. Given the model:

from transformers import PreTrainedTokenizerFast
from transformers import EncoderDecoderModel
from transformers import pipeline

tokenizer = PreTrainedTokenizerFast.from_pretrained("bert-base-multilingual-uncased")

tokenizer.bos_token = tokenizer.cls_token
tokenizer.eos_token = tokenizer.sep_token
tokenizer.add_special_tokens({'pad_token': '[PAD]'})

multibert = EncoderDecoderModel.from_encoder_decoder_pretrained(
    "bert-base-multilingual-uncased", "bert-base-multilingual-uncased"
)

# set special tokens
multibert.config.decoder_start_token_id = tokenizer.bos_token_id
multibert.config.eos_token_id = tokenizer.eos_token_id
multibert.config.pad_token_id = tokenizer.pad_token_id


m = pipeline("translation", model=multibert, tokenizer=tokenizer)
m.save_pretrained('test-model')

It saves in the directory with a structure that looks like:

! ls test-model/*

test-model/config.json		       test-model/special_tokens_map.json
test-model/generation_config.json  test-model/tokenizer_config.json
test-model/pytorch_model.bin	   test-model/tokenizer.json

After that I tarball it and push it into an S3 bucket:

! tar -cvzf test-model.tar.gz test-model/*
! aws s3 cp test-model.tar.gz s3://mybucket/test-model.tar.gz

Is that the right way to compress the model into tar.gz format?

And to deploy the model up, are there additional steps that needs to be checked such that the model can be loaded? E.g. do I need an inference.py file to use the predictor?

I experienced the same problem. I would like to know if it has been resolved?

I experienced the same problem.
My problem was when I created the tar.gz file. I compressed the folder, it’s wrong. Go into the folder and compress the files.