Error deploying BERT on SageMaker

wsunadawong · July 15, 2021, 8:03pm

I fine-tuned BERT for text-classification on a custom dataset using HuggingFace and Tensorflow, and now I’m trying to deploy the model for inference through SageMaker. I followed this HuggingFace tutorial but I get the following error. I spent a while looking through the SageMaker HuggingFace documentation to no avail. The error says that model_uri is set to None, but model_uri is not a parameter that I can pass, and I just want it to pull my model from the HuggingFace Hub.

I also tried downloading the model from the Hub, zipping it, uploading it to S3, and passing model_data=“model.tar.gz”, but that didn’t work either.

Any help would be greatly appreciated!

wsunadawong · July 16, 2021, 3:23pm

Resolved: I just needed to add an image_uri!

philschmid · July 21, 2021, 7:06am

Hey @wsunadawong, I moved your post into the Amazon SageMaker category in the forum. I didn’t catch it when you posted it.

Quick question regarding your issue. Did you need to add image_uri to the HuggingFaceModel to run it? This should be the case with using the latest version of sagemaker. Could you please share how you deployed it?

wsunadawong · July 21, 2021, 12:33pm

Thanks for your help!

At first, I added image_uri to the HuggingFaceModel which worked. First, I tried using image_uris.retrieve(framework='huggingface',region='us-east-1', instance_type='ml.t2.medium',image_scope='inference',base_framework_version='tensorflow2.4.1'), but it gave an error that the image_uri could not be found, so then I went to this list of images and chose the image uri with the following properties: TensorFlow 2.4.1 with HuggingFace transformers, inference, CPU, py37.

Then I noticed that MultiDataModel requires launching from S3, so switched from HuggingFaceModel to Model in order to test pulling the model from the S3 bucket. Now that I look at it, it’s quite possible the model is not using the S3 model because I’m still passing in env=hub. (Could that explain why the Model works but MultiDataModel doesn’t?)

wsunadawong · July 21, 2021, 3:39pm

Update: using Model gives the same error as MultiDataModel when I remove env=hub as a parameter. I’m confused about this error, because my counterargument.tar.gz does contain a config.json file.

philschmid · July 22, 2021, 9:18am

I saw you used from sagemaker.model import Model. We created a model class HuggingFaceModel, which you can use with out providing an image_uri.
Documentation to this is here: Hugging Face — sagemaker 2.109.0 documentation and Deploy models to Amazon SageMaker
and a small code snippet here

from sagemaker.huggingface import HuggingFaceModel

# create Hugging Face Model Class
huggingface_model = HuggingFaceModel(
    transformers_version='4.6',
    tensorflow_versuin='2.4',
    py_version='py37',
    model_data='s3://my-trained-model/artifcats/model.tar.gz',
    role=role,
)
# deploy model to SageMaker Inference
huggingface_model.deploy(initial_instance_count=1,instance_type="ml.m5.xlarge")

We currently have an image for tensorflow 2.4.1.

I got the model to work with the following snippet.

How does your model.tar.gz look like?

wsunadawong · July 22, 2021, 11:53am

Yes, so I used HuggingFaceModel before, and it did work successfully! The problem is that I want to run multiple models on the same endpoint. Unfortunately, I don’t know how to add a HuggingFaceModel to a MultiDataModel, which is why I was using a plain old Model instead. The only way I know how to add a model is by adding a path to the S3 bucket with add_model(model_data_source, model_data_path). (Here, I cannot specify that I want the model to be a HuggingFaceModel, so I assume it defaults to a regular Model, which is why it fails?) Are you able to get a MultiDataModel running with multiple HuggingFaceModels?

wsunadawong · July 22, 2021, 11:55am

My counterargument.tar.gz is just a zipped version of my HuggingFace git repo:

philschmid · July 22, 2021, 12:56pm

Let’s move everything related to MultiDataModel to this Model works but MultiDataModel doesn't - #10 by dan21c so we can discuss it there.
If normal deployment works we can “close” this.
Or does normal deployment with your model.tar.gz not work?

Could you share how you created the archive of counterargument.tar.gz. Sometimes the structure is is

model.tar.gz
   - model
        - config.json

That way there is an extra folder and SageMaker might not recognize it.

wsunadawong · July 22, 2021, 5:13pm

I don’t think normal deployment is working for me. I tried running the code snippet you sent, and it gave me an error about needing to define a TASK.

When I added the HF_TASK in the env parameter, I got the same error as with the MultiDataModel: not being able to find the config.json.

I created the archive by cloning my HuggingFace repo, and running tar -czvf counterargument.tar.gz counterargument_hugging. Here’s the structure:

counterargument.tar.gz
    - config.json
    - tf_model.h5
    - tokenizer.json
    - special_tokens_map.json
    - tokenizer_config.json
    - vocab.txt

I tried rerunning with the model subfolder structure that you suggested, but it just gave the same “config.json” error unfortunately.

philschmid · July 23, 2021, 8:52am

Hey @wsunadawong,

the first image you shared is using your model_data and not the hub configuration.
Could you try it using the hub configuration if this works then the counterargument.tar.gz needs to be wrong.

Could you try creating the archive with the following steps?

Download the model

git lfs install
git clone https://huggingface.co/wsunadawong/counterargument_hugging

Create a tar file

cd counterargument_hugging
tar zcvf model.tar.gz *

Upload model.tar.gz to s3

aws s3 cp model.tar.gz <s3://mymodel>

wsunadawong · July 23, 2021, 3:58pm

That worked! The MultiDataModel works now! When I zipped the first time, I included the folder with tar zcvf model.tar.gz counterargument_hugging when I should’ve just included the folder contents. Thank you so much for your help.

cfloressuazo · November 19, 2021, 6:19am

Hi @wsunadawong and @philschmid

I am facing an error when deploying a fine-tuned BERT model on sagemaker. I have trained the model already and the model is in S3 (model.tar.gz). When I try running the snippet of code below, I get an error calling the deploy(...). → TypeError: expected str, bytes or os.PathLike object, not NoneType

from sagemaker.huggingface.model import HuggingFaceModel
import sagemaker 

role = sagemaker.get_execution_role()

# create Hugging Face Model Class
huggingface_model = HuggingFaceModel(
    model_data="s3://sagemaker/huggingface-pytorch-training/output/model.tar.gz",  # path to your trained sagemaker model
    role=role, # iam role with permissions to create an Endpoint
    transformers_version="4.6", # transformers version used
    pytorch_version="1.7", # pytorch version used
    py_version="py36", # python version of the DLC
)

Then deploy

# deploy model to SageMaker Inference
predictor = huggingface_model.deploy(
   initial_instance_count=1,
   instance_type="ml.m5.xlarge"
)

And this is the error

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-36-da06cee412a2> in <module>
      2 predictor = huggingface_model.deploy(
      3    initial_instance_count=1,
----> 4    instance_type="ml.m5.xlarge"
      5 )

/opt/conda/lib/python3.6/site-packages/sagemaker/model.py in deploy(self, initial_instance_count, instance_type, serializer, deserializer, accelerator_type, endpoint_name, tags, kms_key, wait, data_capture_config, **kwargs)
    761             if self._base_name is not None:
    762                 self._base_name = "-".join((self._base_name, compiled_model_suffix))
--> 763 
    764         self._create_sagemaker_model(instance_type, accelerator_type, tags)
    765         production_variant = sagemaker.production_variant(

/opt/conda/lib/python3.6/site-packages/sagemaker/model.py in _create_sagemaker_model(self, instance_type, accelerator_type, tags)
    315         Args:
    316             output_path (str): where in S3 to store the output of the job
--> 317             role (str): what role to use when executing the job
    318             packaging_job_name (str): what to name the packaging job
    319             compilation_job_name (str): what compilation job to source the model from

/opt/conda/lib/python3.6/site-packages/sagemaker/huggingface/model.py in prepare_container_def(self, instance_type, accelerator_type)
    269 
    270         deploy_key_prefix = model_code_key_prefix(self.key_prefix, self.name, deploy_image)
--> 271         self._upload_code(deploy_key_prefix, repack=True)
    272         deploy_env = dict(self.env)
    273         deploy_env.update(self._framework_env_vars())

/opt/conda/lib/python3.6/site-packages/sagemaker/model.py in _upload_code(self, key_prefix, repack)
   1136             utils.repack_model(
   1137                 inference_script=self.entry_point,
-> 1138                 source_directory=self.source_dir,
   1139                 dependencies=self.dependencies,
   1140                 model_uri=self.model_data,

/opt/conda/lib/python3.6/site-packages/sagemaker/utils.py in repack_model(inference_script, source_directory, dependencies, model_uri, repacked_model_uri, sagemaker_session, kms_key)
    413 
    414         _create_or_update_code_dir(
--> 415             model_dir, inference_script, source_directory, dependencies, sagemaker_session, tmp
    416         )
    417 

/opt/conda/lib/python3.6/site-packages/sagemaker/utils.py in _create_or_update_code_dir(model_dir, inference_script, source_directory, dependencies, sagemaker_session, tmp)
    461             os.mkdir(code_dir)
    462         try:
--> 463             shutil.copy2(inference_script, code_dir)
    464         except FileNotFoundError:
    465             if os.path.exists(os.path.join(code_dir, inference_script)):

/opt/conda/lib/python3.6/shutil.py in copy2(src, dst, follow_symlinks)
    260     """
    261     if os.path.isdir(dst):
--> 262         dst = os.path.join(dst, os.path.basename(src))
    263     copyfile(src, dst, follow_symlinks=follow_symlinks)
    264     copystat(src, dst, follow_symlinks=follow_symlinks)

/opt/conda/lib/python3.6/posixpath.py in basename(p)
    144 def basename(p):
    145     """Returns the final component of a pathname"""
--> 146     p = os.fspath(p)
    147     sep = _get_sep(p)
    148     i = p.rfind(sep) + 1

TypeError: expected str, bytes or os.PathLike object, not NoneType

I opened the model.tar.gz locally and this is the content of the folder:

Any help is much appreciated!

philschmid · November 19, 2021, 7:57am

Hey @cfloressuazo,

Thank you for opening the thread and providing and the information.
For looking at your folder structure of model.tar.gz I see that you are missing a tokenizers. How did you save your model + tokenizer in your training script?

cfloressuazo · November 19, 2021, 1:37pm

Hey @philschmid, thanks for the quick answer!

I am using the train.py script that is part of the tutorial from Huggingface. I’m pasting it in here:

from transformers import AutoModelForSequenceClassification, Trainer, TrainingArguments
from sklearn.metrics import accuracy_score, precision_recall_fscore_support
from datasets import load_from_disk
import random
import logging
import sys
import argparse
import os
import torch

if __name__ == "__main__":

    parser = argparse.ArgumentParser()

    # hyperparameters sent by the client are passed as command-line arguments to the script.
    parser.add_argument("--epochs", type=int, default=3)
    parser.add_argument("--train-batch-size", type=int, default=32)
    parser.add_argument("--eval-batch-size", type=int, default=64)
    parser.add_argument("--warmup_steps", type=int, default=500)
    parser.add_argument("--model_name", type=str)
    parser.add_argument("--learning_rate", type=str, default=5e-5)

    # Data, model, and output directories
    parser.add_argument("--output-data-dir", type=str, default=os.environ["SM_OUTPUT_DATA_DIR"])
    parser.add_argument("--model-dir", type=str, default=os.environ["SM_MODEL_DIR"])
    parser.add_argument("--n_gpus", type=str, default=os.environ["SM_NUM_GPUS"])
    parser.add_argument("--training_dir", type=str, default=os.environ["SM_CHANNEL_TRAIN"])
    parser.add_argument("--test_dir", type=str, default=os.environ["SM_CHANNEL_TEST"])

    args, _ = parser.parse_known_args()

    # Set up logging
    logger = logging.getLogger(__name__)

    logging.basicConfig(
        level=logging.getLevelName("INFO"),
        handlers=[logging.StreamHandler(sys.stdout)],
        format="%(asctime)s - %(name)s - %(levelname)s - %(message)s",
    )

    # load datasets
    train_dataset = load_from_disk(args.training_dir)
    test_dataset = load_from_disk(args.test_dir)

    logger.info(f" loaded train_dataset length is: {len(train_dataset)}")
    logger.info(f" loaded test_dataset length is: {len(test_dataset)}")

    # compute metrics function for binary classification
    def compute_metrics(pred):
        labels = pred.label_ids
        preds = pred.predictions.argmax(-1)
        precision, recall, f1, _ = precision_recall_fscore_support(labels, preds, average="binary")
        acc = accuracy_score(labels, preds)
        return {"accuracy": acc, "f1": f1, "precision": precision, "recall": recall}

    # download model from model hub
    model = AutoModelForSequenceClassification.from_pretrained(args.model_name)

    # define training args
    training_args = TrainingArguments(
        output_dir=args.model_dir,
        num_train_epochs=args.epochs,
        per_device_train_batch_size=args.train_batch_size,
        per_device_eval_batch_size=args.eval_batch_size,
        warmup_steps=args.warmup_steps,
        evaluation_strategy="epoch",
        logging_dir=f"{args.output_data_dir}/logs",
        learning_rate=float(args.learning_rate),
    )

    # create Trainer instance
    trainer = Trainer(
        model=model,
        args=training_args,
        compute_metrics=compute_metrics,
        train_dataset=train_dataset,
        eval_dataset=test_dataset,
    )

    # train model
    trainer.train()

    # evaluate model
    eval_result = trainer.evaluate(eval_dataset=test_dataset)

    # writes eval result to file which can be accessed later in s3 ouput
    with open(os.path.join(args.output_data_dir, "eval_results.txt"), "w") as writer:
        print(f"***** Eval results *****")
        for key, value in sorted(eval_result.items()):
            writer.write(f"{key} = {value}\\n")

    # Saves the model to s3; default is /opt/ml/model which SageMaker sends to S3
    trainer.save_model(args.model_dir)

philschmid · November 19, 2021, 3:01pm

cfloressuazo:

    trainer = Trainer(
        model=model,
        args=training_args,
        compute_metrics=compute_metrics,
        train_dataset=train_dataset,
        eval_dataset=test_dataset,
    )

There tokenizer is missing and when you then call trainer.save_model it only saves the model and not the tokenizer as well.
If you take a look at: https://github.com/huggingface/notebooks/blob/87cde801e765caf5936808c61d4edd62cc32abf1/sagemaker/01_getting_started_pytorch/scripts/train.py#L58
The tokenizer is loaded and passed to the Trainer to save it as well.

Can you share the link where you found this?

cfloressuazo · November 19, 2021, 3:29pm

I see! It makes total sense. I copied and pasted without any modifications the train.py script in this blog post from Huggingface The Partnership: Amazon SageMaker and Hugging Face. Maybe that tutorial is outdated or requires some changes*

Another question that I have is that I don’t defined anywhere the way the data is passed to the model for inference. I haven’t got to that point yet, but I am assuming the data should be passed as a json with a key as inputs and value as array of the text? Something like this: data = {'inputs': ['text_1', 'text_2', ...]}. And that is regardless of the dataset and its format that I have used?

Thank you again!

philschmid · November 22, 2021, 8:47am

Thank you! i ll fix it there.

For inference, the Hugging Face Inference Toolkit is used, which is easily speaking a sagemaker compatible wrapper around the pipelines object with an API structure similar to the Inference API. You can find more documentation here: Reference

I00N · April 1, 2025, 9:07am

Hello, I’m currently facing a similar issue to the ones in this conversation. I finetuned LLama3.1 and I’m trying to deploy to sagemaker directly from huggingface, it deploys to endpoint successfully but when I try to call the endpoint, I get an error that the inference script is missing. Is there anyway to attach the inference script to it after pulling from HF?

pagezyhf · April 1, 2025, 2:23pm

Hello,
I think you can use a TGI container instead of the Hugging Face Inference DLC to simplify the process.

On the model page of your base model, click on Deploy > Sagemaker > check the Sagemaker SDK code snippet and just replace with your token and private model ID of your fine tuned model.

Cheers,
Simon

Topic		Replies	Views
Use my finetuned Bert Model in SageMaker BatchTransform Amazon SageMaker	4	2975	April 30, 2022
InternalServerException when running a model loaded on S3 Amazon SageMaker	4	984	August 6, 2021
Inference failed for FLAN-UL2(20B) on SageMaker Amazon SageMaker	6	2166	April 4, 2023
Error loading finetuned llama2 model while running inference Amazon SageMaker	27	4809	September 20, 2023
InternalServerException from bart model created from s3 Amazon SageMaker	1	390	May 22, 2023

Error deploying BERT on SageMaker

Related topics