Deploying Open AI's whisper on Sagemaker

Hi @thuangster , did you upload a new model.tar.gz with a requrements.txt file?

1 Like

Hi @razvanp, thank you for the reply.
I did upload a new model.tar.gz. Then I downloaded model.tar.gz, unzip it and created a folder named “code”, and added requirements.txt inside, with transformers==4.23.1

After that I tried the following:

!wget https://cdn-media.huggingface.co/speech_samples/sample1.flac
audio_path = "sample1.flac"
res = predictor.predict(data=audio_path)
print(res)

I am getting

ModelError: An error occurred (ModelError) when calling the InvokeEndpoint operation: Received client error (400) from primary with message "{
  "code": 400,
  "type": "InternalServerException",
  "message": "\u0027whisper\u0027"
}

Then I tried this:

SAMPLING_RATE = 1000
with open(audio_path, "rb") as file:
    audio_file = file.read()
audio_nparray = ffmpeg_read(audio_file, SAMPLING_RATE)
predictor.predict({
    "raw": audio_nparray,
    "sampling_rate": SAMPLING_RATE
})

Now I am getting

/opt/conda/lib/python3.7/site-packages/sagemaker/predictor.py in _create_request_args(self, data, initial_args, target_model, target_variant, inference_id)
    197             args["InferenceId"] = inference_id
    198 
--> 199         data = self.serializer.serialize(data)
    200 
    201         args["Body"] = data

/opt/conda/lib/python3.7/site-packages/sagemaker/serializers.py in serialize(self, data)
    390             return data
    391 
--> 392         raise ValueError(f"Object of type {type(data)} is not Data serializable.")

ValueError: Object of type <class 'dict'> is not Data serializable.

I am new to this so I might be asking dumb questions. Any help will be appreciated. Thanks.

Let’s take one at the time: when you get this error “message”: “\u0027whisper\u0027” this is because of the wrong Transformers version. You need to solve that first.
If you use audio_serializer = DataSerializer(content_type=‘audio/x-audio’) provided in the example by @thusken, you don need to so the serialization yourself. Just use

predictor = sagemaker.predictor.Predictor(endpoint_name=endpoint_name,serializer=audio_serializer)

                                 
# Make a prediction using the Predictor object
prediction = predictor.predict(data=audio_path)
1 Like

Thank you @razvanp, I have been busy with work and school and just got time to play with this today. I finally got the Transformer version correct and is able to make a prediction.

My final goal is trying to make subtitles of mp4 videos. Would it be better if I:

A) use something like FFMPEG for AWS Lambda lambda-video - npm to extract audio from mp4 put it in S3 then then call SageMaker endpoint with S3 path

B) write customized inference script that downloads mp4 from S3 then uses ffmpeg to extract audio and make a prediction

I am still trying to learn how to use all these tools, any guidance will be appreciated. Thanks

Hi @thuangster , It seems that the Huggingface version of Whisper is not really maintained: the 30 seconds limitation for audio makes it quite useless. Currently I am looking for some other options.

Thanks for the reply. If the 30 seconds limitation cannot be resolved then it is quite useless. Please let me know if you find a better option.

Hi, I have been following this along and have managed to deploy the “whisper-small” model with the requirements file. However I am getting this error when I try to predict using the predictor.

ModelError: An error occurred (ModelError) when calling the InvokeEndpoint operation: Received client error (400) from primary with message "{
“code”: 400,
“type”: “InternalServerException”,
“message”: “\u0027str\u0027 object has no attribute \u0027pop\u0027”
}

This is my code for deploying and prediction. Any help will be appreciated

huggingface_model = HuggingFaceModel(
model_data=s3_location,
role=role,
transformers_version=‘4.17.0’,
pytorch_version=‘1.10.2’,
py_version=‘py38’,
env={
‘HF_MODEL_ID’:‘openai/whisper-small’,
‘HF_TASK’:‘automatic-speech-recognition’
}
)
audio_serializer = DataSerializer(content_type=‘audio/x-audio’)

whisper_predictor = huggingface_model.deploy(
initial_instance_count=1,
instance_type=‘ml.m5.xlarge’,
serializer=audio_serializer,
)
audio_path = “sample1.flac”
result = whisper_predictor.predict(data=audio_path)

Could you try to use the latest version, we also have a sample here: notebooks/sagemaker-notebook.ipynb at main · huggingface/notebooks · GitHub

Thanks for the reply @philschmid. Do you mean the transformers version in the requirements file? I have just tried changing the versions in the HuggingFaceModel like the sample notebook but I ran into the same error “message”: “\u0027str\u0027 object has no attribute \u0027pop\u0027”.

huggingface_model = HuggingFaceModel(
model_data=s3_location,
role=role,
transformers_version=“4.26”,
pytorch_version=“1.13”,
py_version=‘py39’,
env={
‘HF_MODEL_ID’:‘openai/whisper-large’,
‘HF_TASK’:‘automatic-speech-recognition’
}
)

audio_serializer = DataSerializer(content_type=‘audio/x-audio’)

deploy the endpoint endpoint

whisper_predictor = huggingface_model.deploy(
initial_instance_count=1,
instance_type=“ml.g4dn.2xlarge”,
serializer=audio_serializer,
endpoint_name = “whisper-large”
)

Edit: I have found the error. When getting the predictor back, I did not add the serializer.

predictor = sagemaker.predictor.Predictor(endpoint_name=endpoint_name,serializer=audio_serializer)

with

predictor.predict({
    'inputs': "sample1.flac"
})

getting the same error, how did you fix it?

found the fix, need to add these to config

    env={
        'HF_TASK':'automatic-speech-recognition'
    }

without this model is unable to identify the given task


I was able to bypass max output limit, which is set in “generation_config.json” as * max_length:448, you can update this to some bigger number like 6000000

getting a warning afterwards,

UserWarning: Neither `max_length` nor `max_new_tokens` has been set, `max_length` will default to 36000 (`generation_config.max_length`). Controlling `max_length` via the config is deprecated and `max_length` will be removed from the config in v5 of Transformers -- we recommend using `max_new_tokens` to control the maximum length of the generation.

what then the issue I am getting is error 413, which is related to how sagemaker works and has nothing to do hf model

currently the code, that is working directly streams the file using the path to the endpoint; so the solution is rather to give s3 path and download and process file inside the endpoint;
which needs an custom “inference.py”

let me know if anyone still working on this

Hey @MLLife - note that max_length refers to the maximum token length generated per-batch. Whisper has an intrinsic upper-limit of 448 (which should be plenty!)

is there any example like this but with the wisper model? That can transcribe more than 30 sec of audio? Thx

I too got this error for Image-to-text model, did the serialization for image data but still error not solved.

I can not provide the configuration setting of the whisper-largev3 model while deploying.
I want to specify batch size, language, etc