Error deploying BERT on SageMaker

I fine-tuned BERT for text-classification on a custom dataset using HuggingFace and Tensorflow, and now I’m trying to deploy the model for inference through SageMaker. I followed this HuggingFace tutorial but I get the following error. I spent a while looking through the SageMaker HuggingFace documentation to no avail. The error says that model_uri is set to None, but model_uri is not a parameter that I can pass, and I just want it to pull my model from the HuggingFace Hub.

I also tried downloading the model from the Hub, zipping it, uploading it to S3, and passing model_data=“model.tar.gz”, but that didn’t work either.

Any help would be greatly appreciated!

Resolved: I just needed to add an image_uri!

Hey @wsunadawong, I moved your post into the Amazon SageMaker category in the forum. I didn’t catch it when you posted it.

Quick question regarding your issue. Did you need to add image_uri to the HuggingFaceModel to run it? This should be the case with using the latest version of sagemaker. Could you please share how you deployed it?

Thanks for your help!

At first, I added image_uri to the HuggingFaceModel which worked. First, I tried using image_uris.retrieve(framework='huggingface',region='us-east-1', instance_type='ml.t2.medium',image_scope='inference',base_framework_version='tensorflow2.4.1'), but it gave an error that the image_uri could not be found, so then I went to this list of images and chose the image uri with the following properties: TensorFlow 2.4.1 with HuggingFace transformers, inference, CPU, py37.

Then I noticed that MultiDataModel requires launching from S3, so switched from HuggingFaceModel to Model in order to test pulling the model from the S3 bucket. Now that I look at it, it’s quite possible the model is not using the S3 model because I’m still passing in env=hub. (Could that explain why the Model works but MultiDataModel doesn’t?)

Update: using Model gives the same error as MultiDataModel when I remove env=hub as a parameter. I’m confused about this error, because my counterargument.tar.gz does contain a config.json file.

I saw you used from sagemaker.model import Model. We created a model class HuggingFaceModel, which you can use with out providing an image_uri.
Documentation to this is here: Hugging Face — sagemaker 2.49.2 documentation and Deploy models to Amazon SageMaker
and a small code snippet here

from sagemaker.huggingface import HuggingFaceModel

# create Hugging Face Model Class
huggingface_model = HuggingFaceModel(
# deploy model to SageMaker Inference

We currently have an image for tensorflow 2.4.1.

I got the model to work with the following snippet.

How does your model.tar.gz look like?

Yes, so I used HuggingFaceModel before, and it did work successfully! The problem is that I want to run multiple models on the same endpoint. Unfortunately, I don’t know how to add a HuggingFaceModel to a MultiDataModel, which is why I was using a plain old Model instead. The only way I know how to add a model is by adding a path to the S3 bucket with add_model(model_data_source, model_data_path). (Here, I cannot specify that I want the model to be a HuggingFaceModel, so I assume it defaults to a regular Model, which is why it fails?) Are you able to get a MultiDataModel running with multiple HuggingFaceModels?

My counterargument.tar.gz is just a zipped version of my HuggingFace git repo:

Let’s move everything related to MultiDataModel to this Model works but MultiDataModel doesn't - #10 by dan21c so we can discuss it there.
If normal deployment works we can “close” this.
Or does normal deployment with your model.tar.gz not work?

Could you share how you created the archive of counterargument.tar.gz. Sometimes the structure is is

   - model
        - config.json

That way there is an extra folder and SageMaker might not recognize it.

I don’t think normal deployment is working for me. I tried running the code snippet you sent, and it gave me an error about needing to define a TASK.

When I added the HF_TASK in the env parameter, I got the same error as with the MultiDataModel: not being able to find the config.json.

I created the archive by cloning my HuggingFace repo, and running tar -czvf counterargument.tar.gz counterargument_hugging. Here’s the structure:

    - config.json
    - tf_model.h5
    - tokenizer.json
    - special_tokens_map.json
    - tokenizer_config.json
    - vocab.txt

I tried rerunning with the model subfolder structure that you suggested, but it just gave the same “config.json” error unfortunately.

Hey @wsunadawong,

the first image you shared is using your model_data and not the hub configuration.
Could you try it using the hub configuration if this works then the counterargument.tar.gz needs to be wrong.

Could you try creating the archive with the following steps?

  1. Download the model
git lfs install
git clone
  1. Create a tar file
cd counterargument_hugging
tar zcvf model.tar.gz *
  1. Upload model.tar.gz to s3
aws s3 cp model.tar.gz <s3://mymodel>
1 Like

That worked! The MultiDataModel works now! When I zipped the first time, I included the folder with tar zcvf model.tar.gz counterargument_hugging when I should’ve just included the folder contents. :man_facepalming: Thank you so much for your help. :blush:

1 Like

Hi @wsunadawong and @philschmid

I am facing an error when deploying a fine-tuned BERT model on sagemaker. I have trained the model already and the model is in S3 (model.tar.gz). When I try running the snippet of code below, I get an error calling the deploy(...). → TypeError: expected str, bytes or os.PathLike object, not NoneType

from sagemaker.huggingface.model import HuggingFaceModel
import sagemaker 

role = sagemaker.get_execution_role()

# create Hugging Face Model Class
huggingface_model = HuggingFaceModel(
    model_data="s3://sagemaker/huggingface-pytorch-training/output/model.tar.gz",  # path to your trained sagemaker model
    role=role, # iam role with permissions to create an Endpoint
    transformers_version="4.6", # transformers version used
    pytorch_version="1.7", # pytorch version used
    py_version="py36", # python version of the DLC

Then deploy

# deploy model to SageMaker Inference
predictor = huggingface_model.deploy(

And this is the error

TypeError                                 Traceback (most recent call last)
<ipython-input-36-da06cee412a2> in <module>
      2 predictor = huggingface_model.deploy(
      3    initial_instance_count=1,
----> 4    instance_type="ml.m5.xlarge"
      5 )

/opt/conda/lib/python3.6/site-packages/sagemaker/ in deploy(self, initial_instance_count, instance_type, serializer, deserializer, accelerator_type, endpoint_name, tags, kms_key, wait, data_capture_config, **kwargs)
    761             if self._base_name is not None:
    762                 self._base_name = "-".join((self._base_name, compiled_model_suffix))
--> 763 
    764         self._create_sagemaker_model(instance_type, accelerator_type, tags)
    765         production_variant = sagemaker.production_variant(

/opt/conda/lib/python3.6/site-packages/sagemaker/ in _create_sagemaker_model(self, instance_type, accelerator_type, tags)
    315         Args:
    316             output_path (str): where in S3 to store the output of the job
--> 317             role (str): what role to use when executing the job
    318             packaging_job_name (str): what to name the packaging job
    319             compilation_job_name (str): what compilation job to source the model from

/opt/conda/lib/python3.6/site-packages/sagemaker/huggingface/ in prepare_container_def(self, instance_type, accelerator_type)
    270         deploy_key_prefix = model_code_key_prefix(self.key_prefix,, deploy_image)
--> 271         self._upload_code(deploy_key_prefix, repack=True)
    272         deploy_env = dict(self.env)
    273         deploy_env.update(self._framework_env_vars())

/opt/conda/lib/python3.6/site-packages/sagemaker/ in _upload_code(self, key_prefix, repack)
   1136             utils.repack_model(
   1137                 inference_script=self.entry_point,
-> 1138                 source_directory=self.source_dir,
   1139                 dependencies=self.dependencies,
   1140                 model_uri=self.model_data,

/opt/conda/lib/python3.6/site-packages/sagemaker/ in repack_model(inference_script, source_directory, dependencies, model_uri, repacked_model_uri, sagemaker_session, kms_key)
    414         _create_or_update_code_dir(
--> 415             model_dir, inference_script, source_directory, dependencies, sagemaker_session, tmp
    416         )

/opt/conda/lib/python3.6/site-packages/sagemaker/ in _create_or_update_code_dir(model_dir, inference_script, source_directory, dependencies, sagemaker_session, tmp)
    461             os.mkdir(code_dir)
    462         try:
--> 463             shutil.copy2(inference_script, code_dir)
    464         except FileNotFoundError:
    465             if os.path.exists(os.path.join(code_dir, inference_script)):

/opt/conda/lib/python3.6/ in copy2(src, dst, follow_symlinks)
    260     """
    261     if os.path.isdir(dst):
--> 262         dst = os.path.join(dst, os.path.basename(src))
    263     copyfile(src, dst, follow_symlinks=follow_symlinks)
    264     copystat(src, dst, follow_symlinks=follow_symlinks)

/opt/conda/lib/python3.6/ in basename(p)
    144 def basename(p):
    145     """Returns the final component of a pathname"""
--> 146     p = os.fspath(p)
    147     sep = _get_sep(p)
    148     i = p.rfind(sep) + 1

TypeError: expected str, bytes or os.PathLike object, not NoneType

I opened the model.tar.gz locally and this is the content of the folder:

Any help is much appreciated!

Hey @cfloressuazo,

Thank you for opening the thread and providing and the information.
For looking at your folder structure of model.tar.gz I see that you are missing a tokenizers. How did you save your model + tokenizer in your training script?

Hey @philschmid, thanks for the quick answer!

I am using the script that is part of the tutorial from Huggingface. I’m pasting it in here:

from transformers import AutoModelForSequenceClassification, Trainer, TrainingArguments
from sklearn.metrics import accuracy_score, precision_recall_fscore_support
from datasets import load_from_disk
import random
import logging
import sys
import argparse
import os
import torch

if __name__ == "__main__":

    parser = argparse.ArgumentParser()

    # hyperparameters sent by the client are passed as command-line arguments to the script.
    parser.add_argument("--epochs", type=int, default=3)
    parser.add_argument("--train-batch-size", type=int, default=32)
    parser.add_argument("--eval-batch-size", type=int, default=64)
    parser.add_argument("--warmup_steps", type=int, default=500)
    parser.add_argument("--model_name", type=str)
    parser.add_argument("--learning_rate", type=str, default=5e-5)

    # Data, model, and output directories
    parser.add_argument("--output-data-dir", type=str, default=os.environ["SM_OUTPUT_DATA_DIR"])
    parser.add_argument("--model-dir", type=str, default=os.environ["SM_MODEL_DIR"])
    parser.add_argument("--n_gpus", type=str, default=os.environ["SM_NUM_GPUS"])
    parser.add_argument("--training_dir", type=str, default=os.environ["SM_CHANNEL_TRAIN"])
    parser.add_argument("--test_dir", type=str, default=os.environ["SM_CHANNEL_TEST"])

    args, _ = parser.parse_known_args()

    # Set up logging
    logger = logging.getLogger(__name__)

        format="%(asctime)s - %(name)s - %(levelname)s - %(message)s",

    # load datasets
    train_dataset = load_from_disk(args.training_dir)
    test_dataset = load_from_disk(args.test_dir)" loaded train_dataset length is: {len(train_dataset)}")" loaded test_dataset length is: {len(test_dataset)}")

    # compute metrics function for binary classification
    def compute_metrics(pred):
        labels = pred.label_ids
        preds = pred.predictions.argmax(-1)
        precision, recall, f1, _ = precision_recall_fscore_support(labels, preds, average="binary")
        acc = accuracy_score(labels, preds)
        return {"accuracy": acc, "f1": f1, "precision": precision, "recall": recall}

    # download model from model hub
    model = AutoModelForSequenceClassification.from_pretrained(args.model_name)

    # define training args
    training_args = TrainingArguments(

    # create Trainer instance
    trainer = Trainer(

    # train model

    # evaluate model
    eval_result = trainer.evaluate(eval_dataset=test_dataset)

    # writes eval result to file which can be accessed later in s3 ouput
    with open(os.path.join(args.output_data_dir, "eval_results.txt"), "w") as writer:
        print(f"***** Eval results *****")
        for key, value in sorted(eval_result.items()):
            writer.write(f"{key} = {value}\\n")

    # Saves the model to s3; default is /opt/ml/model which SageMaker sends to S3

There tokenizer is missing and when you then call trainer.save_model it only saves the model and not the tokenizer as well.
If you take a look at: notebooks/ at 87cde801e765caf5936808c61d4edd62cc32abf1 · huggingface/notebooks · GitHub
The tokenizer is loaded and passed to the Trainer to save it as well.

Can you share the link where you found this?

I see! It makes total sense. I copied and pasted without any modifications the script in this blog post from Huggingface The Partnership: Amazon SageMaker and Hugging Face. Maybe that tutorial is outdated or requires some changes*

Another question that I have is that I don’t defined anywhere the way the data is passed to the model for inference. I haven’t got to that point yet, but I am assuming the data should be passed as a json with a key as inputs and value as array of the text? Something like this: data = {'inputs': ['text_1', 'text_2', ...]}. And that is regardless of the dataset and its format that I have used?

Thank you again!

1 Like

Thank you! i ll fix it there.

For inference, the Hugging Face Inference Toolkit is used, which is easily speaking a sagemaker compatible wrapper around the pipelines object with an API structure similar to the Inference API. You can find more documentation here: Reference