Inference Toolkit - custom inference with multiple models


I’m trying to perform custom inference, where I need to a model and a tokenizer hosted at 2 different repositories on HuggingFace. I have looked at Sample customer inference notebook, however it only uses a single model.

Similarly, the code summarization notebook also uses a tokenizer and model from the same directory. I want to implement something similar, but with model and tokenizer hosted at 2 different HuggingFace repositories.

If I am to load a model and a tokenizer hosted at 2 different HuggingFace repos, zip them to a tar.gz file and finally push them to s3, what should the directory structure of this tar.gz be? I have tried the following with no success:

        model_config.json (along with other model files)

my custom model loading function looks like:

def model_fn(model_dir):
    model = AutoModel.from_pretrained(f"{model_dir}/model1")
    tokenizer= T5Tokenizer.from_pretrained(f'{model_dir}/tokenizer')
    return model, tokenizer

And my model creation function looks like:

from sagemaker.huggingface.model import HuggingFaceModel

huggingface_model = HuggingFaceModel(entry_point='',

The error I’m seeing is:

OSError: file /.sagemaker/mms/models/model/config.json not found.

I’d appreciate some help figuring out if my directory structure is incorrect, and how should it be. If there are better ways to achieve this task, please suggest.

as per my understanding your “code” folder should be outside where it is currently. Also, you need to download a model snapshot, place “code” folder in it, place other files that your tokenizer might require and .tar.gz them.