I am working on deploying a speech-recognition app using HuggingFace following the instructions here. My understanding is that the inference toolkit uses pipelines, but the speech-recognition is only introduced with the > 4.9.0 releases, whereas the current AWS images are pointing to 4.6.x.
Is there any way around this? What do you think suggest that I do to make the deployment work? My hunch is that I need to supply a new image_uri.
Great to hear that you are working on a speech task!! Yes, the inference toolkit uses the pipelines from transformers. The code is open source you want to take a deeper look GitHub - aws/sagemaker-huggingface-inference-toolkit.
I am happy to share that we are working on new releases for the DLC, which include 4.9 and higher. Sadly I think it will take around 2 more weeks to be around.
In the meantime, you could use the official DLC and provide as model_data a model.tar.gz which contains a custom module, documented here: Deploy models to Amazon SageMaker
With a custom module, you can provide a requirements.txt to upgrade the dependencies and then provide a inference.py with a custom model_fn to load the asr pipeline.
Thank you Philipp for the quick response and the direction. I will try the model_data route. It’s also great to hear that you are working on updating the DLCs. The team’s work is always much appreciated!
Best
Deniz
Hi @philschmid, I am trying to exactly the same thing as @dzorlu, so I was pleasantly surprised to see that someone else asked this question a few months ago. I am trying to deploy a speech recognition model from the Hugging Face model hub for inference on a Sagemaker notebook using the Inference Toolkit using the instructions here https://huggingface.co/docs/sagemaker/inference#deploy-a-model-from-the-%F0%9F%A4%97-hub. However, when I do predictor.predict({inputs': "audio.wav"}), I get the error Unknown task automatic-speech-recognition .... I guess this is because of the transformers version in the DLC?
If so, is there meanwhile a way to use a newer version than 4.6.1 on Sagemaker so that I can use the inference toolkit to deploy an ASR model from the hub? Thanks in advance!
currently, ASR is not yet supported for zero-code deployments.
We are working on it to ship with the next version of the inference toolkit hopefully within 2-4 weeks. Until that, you can create a custom inference.py for it.
You can follow this example notebooks/sagemaker-notebook.ipynb at master · huggingface/notebooks · GitHub since you are using audio data you additionally need to adjust the input_fn .
Thank you for your answers @philschmid and @marshmellow77
I was very happy to see the list of Sagemaker DLCs, but then I realised that the version of transformers that I want to use (v4.15.0) is not yet supported in the latest DLCs They support up to transformers 4.12.3 currently.
So, what are my options if I want to use a model from transformers v4.15.0 for inference on Sagemaker? If you could point me to any tutorials/documentation on this, that would be helpful too. Thanks!
from sagemaker.huggingface import HuggingFaceModel
import sagemaker
role = sagemaker.get_execution_role()
# Hub Model configuration. https://huggingface.co/models
hub = {
'HF_MODEL_ID':'facebook/wav2vec2-xlsr-53-espeak-cv-ft',
'HF_TASK':'automatic-speech-recognition'
}
# create Hugging Face Model Class
huggingface_model = HuggingFaceModel(
transformers_version='4.17.0',
pytorch_version='1.10.2',
py_version='py38',
image_uri='763104351884.dkr.ecr.us-west-2.amazonaws.com/huggingface-pytorch-inference:1.10.2-transformers4.17.0-cpu-py38-ubuntu20.04-v1.0',
env=hub,
role=role,
)
# deploy model to SageMaker Inference
predictor = huggingface_model.deploy(
initial_instance_count=1, # number of instances
instance_type='ml.m5.xlarge' # ec2 instance type
)
result = predictor.predict({"inputs":"audio.wav"})
print(result)
But the predict function throws an Exception:
ModelError: An error occurred (ModelError) when calling the InvokeEndpoint operation: Received client error (400) from primary with message "{
"code": 400,
"type": "InternalServerException",
"message": "[Errno 2] No such file or directory: \u0027audio.wav\u0027"
The audio.wav file is in the same folder as the Jupyter notebook. I can’t figure out the reason for this Exception. Any clue what might be going wrong? Thanks!
@philschmid Are you sure that’s the problem? Because the Exception says that it couldn’t find the file or folder. So, it expects a file. The documentation for pipelines also says that inputs can be either the filename, bytes or numpy ndarray (Pipelines.call).
Anyways, I tried supplying the predict function with the audio data as bytes and numpy ndarray. But I still get errors.
When I try with bytes:
Code:
with open("audio.wav", "rb") as f:
data = f.read()
predictor.predict({"inputs": data})
Exception:
TypeError: Object of type 'bytes' is not JSON serializable
An error occurred (ModelError) when calling the InvokeEndpoint operation: Received client error (400) from primary with message "{
"code": 400,
"type": "InternalServerException",
"message": "We expect a numpy ndarray as input, got `\u003cclass \u0027float\u0027\u003e`"
}
I am if you check the documentation of the ASR Pipeline it says
inputs ( np.ndarray or bytes or str or dict ) — The inputs is either :
str that is the filename of the audio file, the file will be read at the correct sampling rate to get the waveform using ffmpeg . This requires ffmpeg to be installed on the system.
bytes it is supposed to be the content of an audio file and is interpreted by ffmpeg in the same way.
( np.ndarray of shape (n, ) of type np.float32 or np.float64 ) Raw audio at the correct sampling rate (no further check will be done)
dict form can be used to pass raw audio sampled at arbitrary sampling_rate and let this pipeline do the resampling. The dict must be in the format {"sampling_rate": int, "raw": np.array} with optionally a "stride": (left: int, right: int) than can ask the pipeline to treat the first left samples and last right samples to be ignored in decoding (but used at inference to provide more context to the model). Only use stride with CTC models.
You are providing a str so SageMaker tries to load the audio.wav from disk in the endpoint which of course didn’t exist.
But i am happy to share that we are working on a new DLC version, which makes this much easier