Transformers 4.9.0 on SageMaker


I am working on deploying a speech-recognition app using HuggingFace following the instructions here. My understanding is that the inference toolkit uses pipelines, but the speech-recognition is only introduced with the > 4.9.0 releases, whereas the current AWS images are pointing to 4.6.x.

Is there any way around this? What do you think suggest that I do to make the deployment work? My hunch is that I need to supply a new image_uri.

Thank you!


Hello @dzorlu,

Great to hear that you are working on a speech task!! Yes, the inference toolkit uses the pipelines from transformers. The code is open source you want to take a deeper look GitHub - aws/sagemaker-huggingface-inference-toolkit.

I am happy to share that we are working on new releases for the DLC, which include 4.9 and higher. Sadly I think it will take around 2 more weeks to be around.

In the meantime, you could use the official DLC and provide as model_data a model.tar.gz which contains a custom module, documented here: Deploy models to Amazon SageMaker
With a custom module, you can provide a requirements.txt to upgrade the dependencies and then provide a with a custom model_fn to load the asr pipeline.

1 Like

Thank you Philipp for the quick response and the direction. I will try the model_data route. It’s also great to hear that you are working on updating the DLCs. The team’s work is always much appreciated!

Hi @philschmid, I am trying to exactly the same thing as @dzorlu, so I was pleasantly surprised to see that someone else asked this question a few months ago. I am trying to deploy a speech recognition model from the Hugging Face model hub for inference on a Sagemaker notebook using the Inference Toolkit using the instructions here However, when I do predictor.predict({inputs': "audio.wav"}), I get the error Unknown task automatic-speech-recognition .... I guess this is because of the transformers version in the DLC?

huggingface_model = HuggingFaceModel(

If so, is there meanwhile a way to use a newer version than 4.6.1 on Sagemaker so that I can use the inference toolkit to deploy an ASR model from the hub? Thanks in advance!

Hi Paras, there are indeed many newer versions of the HF Sagemaker DLCs out, see here for reference: Reference


Hey @parasmehta,

currently, ASR is not yet supported for zero-code deployments.
We are working on it to ship with the next version of the inference toolkit hopefully within 2-4 weeks. Until that, you can create a custom for it.
You can follow this example notebooks/sagemaker-notebook.ipynb at master · huggingface/notebooks · GitHub since you are using audio data you additionally need to adjust the input_fn .

Thank you for your answers @philschmid and @marshmellow77
I was very happy to see the list of Sagemaker DLCs, but then I realised that the version of transformers that I want to use (v4.15.0) is not yet supported in the latest DLCs :cry: They support up to transformers 4.12.3 currently.
So, what are my options if I want to use a model from transformers v4.15.0 for inference on Sagemaker? If you could point me to any tutorials/documentation on this, that would be helpful too. Thanks!

Hey @parasmehta,

we are in the middle of releasing a new DLC with transformers version 4.17.0 you can follow this PR: feature: Hugging Face Transformers 4.17 for PT 1.10 by saimidu · Pull Request #3011 · aws/sagemaker-python-sdk · GitHub
In the meantime, you can add a custom requirements.txt and provide the version you want to use.

1 Like

Thank you for the pointer @philschmid. In that PR, I noticed that some containers for version 4.17.0 have already been released as mentioned here: Release v1.0-hf-4.17.0-pt-1.10.2-py38 · aws/deep-learning-containers · GitHub
This nearly made my day :smiley: I tried to use one of these images and run inference on it for ASR using a simple wav file like this:

from sagemaker.huggingface import HuggingFaceModel
import sagemaker

role = sagemaker.get_execution_role()

# Hub Model configuration.
hub = {

# create Hugging Face Model Class
huggingface_model = HuggingFaceModel(

# deploy model to SageMaker Inference
predictor = huggingface_model.deploy(
	initial_instance_count=1, # number of instances
	instance_type='ml.m5.xlarge' # ec2 instance type

result = predictor.predict({"inputs":"audio.wav"})

But the predict function throws an Exception:

ModelError: An error occurred (ModelError) when calling the InvokeEndpoint operation: Received client error (400) from primary with message "{
  "code": 400,
  "type": "InternalServerException",
  "message": "[Errno 2] No such file or directory: \u0027audio.wav\u0027"

The audio.wav file is in the same folder as the Jupyter notebook. I can’t figure out the reason for this Exception. Any clue what might be going wrong? Thanks!

@parasmehta you are sending as input a string “audio.wav” not any audio data at all. You can find documentation on how the predictor works here: Predictors — sagemaker 2.80.0 documentation

@philschmid Are you sure that’s the problem? Because the Exception says that it couldn’t find the file or folder. So, it expects a file. The documentation for pipelines also says that inputs can be either the filename, bytes or numpy ndarray (

Anyways, I tried supplying the predict function with the audio data as bytes and numpy ndarray. But I still get errors.

When I try with bytes:

with open("audio.wav", "rb") as f:
    data =
    predictor.predict({"inputs": data})


TypeError: Object of type 'bytes' is not JSON serializable

When I try with a numpy ndarray:

import torchaudio
speech_array, _sampling_rate = torchaudio.load("audio.wav")
resampler = torchaudio.transforms.Resample(_sampling_rate, 16000)
speech = resampler(speech_array).squeeze().numpy()
result = predictor.predict({"inputs": speech})


An error occurred (ModelError) when calling the InvokeEndpoint operation: Received client error (400) from primary with message "{
  "code": 400,
  "type": "InternalServerException",
  "message": "We expect a numpy ndarray as input, got `\u003cclass \u0027float\u0027\u003e`"

I am a bit clueless and running out of ideas now.

I am if you check the documentation of the ASR Pipeline it says

  • inputs ( np.ndarray or bytes or str or dict ) — The inputs is either :
  • str that is the filename of the audio file, the file will be read at the correct sampling rate to get the waveform using ffmpeg . This requires ffmpeg to be installed on the system.
  • bytes it is supposed to be the content of an audio file and is interpreted by ffmpeg in the same way.
  • ( np.ndarray of shape (n, ) of type np.float32 or np.float64 ) Raw audio at the correct sampling rate (no further check will be done)
  • dict form can be used to pass raw audio sampled at arbitrary sampling_rate and let this pipeline do the resampling. The dict must be in the format {"sampling_rate": int, "raw": np.array} with optionally a "stride": (left: int, right: int) than can ask the pipeline to treat the first left samples and last right samples to be ignored in decoding (but used at inference to provide more context to the model). Only use stride with CTC models.

You are providing a str so SageMaker tries to load the audio.wav from disk in the endpoint which of course didn’t exist.

But i am happy to share that we are working on a new DLC version, which makes this much easier

Ah, I see. Any ideas then why sending bytes or numpy ndarray doesn’t work?