Deploying Open AI's whisper on Sagemaker

razvanp · February 27, 2023, 6:44am

Hi @thuangster , did you upload a new model.tar.gz with a requrements.txt file?

thuangster · February 27, 2023, 10:34am

Hi @razvanp, thank you for the reply.
I did upload a new model.tar.gz. Then I downloaded model.tar.gz, unzip it and created a folder named “code”, and added requirements.txt inside, with transformers==4.23.1

After that I tried the following:

!wget https://cdn-media.huggingface.co/speech_samples/sample1.flac
audio_path = "sample1.flac"
res = predictor.predict(data=audio_path)
print(res)

I am getting

ModelError: An error occurred (ModelError) when calling the InvokeEndpoint operation: Received client error (400) from primary with message "{
  "code": 400,
  "type": "InternalServerException",
  "message": "\u0027whisper\u0027"
}

Then I tried this:

SAMPLING_RATE = 1000
with open(audio_path, "rb") as file:
    audio_file = file.read()
audio_nparray = ffmpeg_read(audio_file, SAMPLING_RATE)
predictor.predict({
    "raw": audio_nparray,
    "sampling_rate": SAMPLING_RATE
})

Now I am getting

/opt/conda/lib/python3.7/site-packages/sagemaker/predictor.py in _create_request_args(self, data, initial_args, target_model, target_variant, inference_id)
    197             args["InferenceId"] = inference_id
    198 
--> 199         data = self.serializer.serialize(data)
    200 
    201         args["Body"] = data

/opt/conda/lib/python3.7/site-packages/sagemaker/serializers.py in serialize(self, data)
    390             return data
    391 
--> 392         raise ValueError(f"Object of type {type(data)} is not Data serializable.")

ValueError: Object of type <class 'dict'> is not Data serializable.

I am new to this so I might be asking dumb questions. Any help will be appreciated. Thanks.

razvanp · February 27, 2023, 11:37am

Let’s take one at the time: when you get this error “message”: “\u0027whisper\u0027” this is because of the wrong Transformers version. You need to solve that first.
If you use audio_serializer = DataSerializer(content_type=‘audio/x-audio’) provided in the example by @thusken, you don need to so the serialization yourself. Just use

predictor = sagemaker.predictor.Predictor(endpoint_name=endpoint_name,serializer=audio_serializer)

                                 
# Make a prediction using the Predictor object
prediction = predictor.predict(data=audio_path)

thuangster · March 14, 2023, 6:24am

Thank you @razvanp, I have been busy with work and school and just got time to play with this today. I finally got the Transformer version correct and is able to make a prediction.

My final goal is trying to make subtitles of mp4 videos. Would it be better if I:

A) use something like FFMPEG for AWS Lambda lambda-video - npm to extract audio from mp4 put it in S3 then then call SageMaker endpoint with S3 path

B) write customized inference script that downloads mp4 from S3 then uses ffmpeg to extract audio and make a prediction

I am still trying to learn how to use all these tools, any guidance will be appreciated. Thanks

razvanp · March 14, 2023, 8:17am

Hi @thuangster , It seems that the Huggingface version of Whisper is not really maintained: the 30 seconds limitation for audio makes it quite useless. Currently I am looking for some other options.

thuangster · March 14, 2023, 8:54am

Thanks for the reply. If the 30 seconds limitation cannot be resolved then it is quite useless. Please let me know if you find a better option.

minko186 · March 22, 2023, 7:20am

Hi, I have been following this along and have managed to deploy the “whisper-small” model with the requirements file. However I am getting this error when I try to predict using the predictor.

ModelError: An error occurred (ModelError) when calling the InvokeEndpoint operation: Received client error (400) from primary with message "{
“code”: 400,
“type”: “InternalServerException”,
“message”: “\u0027str\u0027 object has no attribute \u0027pop\u0027”
}

This is my code for deploying and prediction. Any help will be appreciated

huggingface_model = HuggingFaceModel(
model_data=s3_location,
role=role,
transformers_version=‘4.17.0’,
pytorch_version=‘1.10.2’,
py_version=‘py38’,
env={
‘HF_MODEL_ID’:‘openai/whisper-small’,
‘HF_TASK’:‘automatic-speech-recognition’
}
)
audio_serializer = DataSerializer(content_type=‘audio/x-audio’)

whisper_predictor = huggingface_model.deploy(
initial_instance_count=1,
instance_type=‘ml.m5.xlarge’,
serializer=audio_serializer,
)
audio_path = “sample1.flac”
result = whisper_predictor.predict(data=audio_path)

philschmid · March 22, 2023, 9:08am

Could you try to use the latest version, we also have a sample here: notebooks/sagemaker-notebook.ipynb at main · huggingface/notebooks · GitHub

minko186 · March 22, 2023, 9:29am

Thanks for the reply @philschmid. Do you mean the transformers version in the requirements file? I have just tried changing the versions in the HuggingFaceModel like the sample notebook but I ran into the same error “message”: “\u0027str\u0027 object has no attribute \u0027pop\u0027”.

huggingface_model = HuggingFaceModel(
model_data=s3_location,
role=role,
transformers_version=“4.26”,
pytorch_version=“1.13”,
py_version=‘py39’,
env={
‘HF_MODEL_ID’:‘openai/whisper-large’,
‘HF_TASK’:‘automatic-speech-recognition’
}
)

audio_serializer = DataSerializer(content_type=‘audio/x-audio’)

deploy the endpoint endpoint

whisper_predictor = huggingface_model.deploy(
initial_instance_count=1,
instance_type=“ml.g4dn.2xlarge”,
serializer=audio_serializer,
endpoint_name = “whisper-large”
)

Edit: I have found the error. When getting the predictor back, I did not add the serializer.

predictor = sagemaker.predictor.Predictor(endpoint_name=endpoint_name,serializer=audio_serializer)

MLLife · April 17, 2023, 8:08am

with

predictor.predict({
    'inputs': "sample1.flac"
})

getting the same error, how did you fix it?

MLLife · April 18, 2023, 10:52am

found the fix, need to add these to config

    env={
        'HF_TASK':'automatic-speech-recognition'
    }

without this model is unable to identify the given task

thusken:

@sohoha is correct. The current Huggingface implementation of Whisper only supports 30 seconds of audio, although they are working on supporting longer files. See this issue 24 and this PR 24.

I was able to bypass max output limit, which is set in “generation_config.json” as * max_length:448, you can update this to some bigger number like 6000000

getting a warning afterwards,

UserWarning: Neither `max_length` nor `max_new_tokens` has been set, `max_length` will default to 36000 (`generation_config.max_length`). Controlling `max_length` via the config is deprecated and `max_length` will be removed from the config in v5 of Transformers -- we recommend using `max_new_tokens` to control the maximum length of the generation.

what then the issue I am getting is error 413, which is related to how sagemaker works and has nothing to do hf model

github.com/aws/amazon-sagemaker-examples

Getting error while invoking sagemaker endpoint

opened 08:23PM - 08 May 18 UTC

closed 09:37PM - 29 Jun 18 UTC

Harathi123

I created training job in sagemaker with my own training and inference code usin…g MXNet framework. I am able to train the model successfully and created endpoint as well. But while inferring the model, I am getting the following error: ‘ClientError: An error occurred (413) when calling the InvokeEndpoint operation: HTTP content length exceeded 5246976 bytes.’ What I understood from my research is the error is due to the size of the image. The image shape is (480, 512, 3). I trained the model with images of same shape (480, 512, 3). When I resized the image to (240, 256), the error was gone. But producing another error 'shape inconsistent in convolution' as I the trained the model with images of size (480, 512). I didn’t understand why I am getting this error while inferring. Can't we use images of larger size to infer the model? Any suggestions will be helpful Thanks, Harathi

currently the code, that is working directly streams the file using the path to the endpoint; so the solution is rather to give s3 path and download and process file inside the endpoint;
which needs an custom “inference.py”

let me know if anyone still working on this

sanchit-gandhi · April 26, 2023, 3:58pm

Hey @MLLife - note that max_length refers to the maximum token length generated per-batch. Whisper has an intrinsic upper-limit of 448 (which should be plenty!)

garmenteras · May 10, 2023, 10:05am

is there any example like this but with the wisper model? That can transcribe more than 30 sec of audio? Thx

trunks · October 16, 2023, 11:40am

I too got this error for Image-to-text model, did the serialization for image data but still error not solved.

Yerdaulet-Zh · April 12, 2024, 10:13pm

I can not provide the configuration setting of the whisper-largev3 model while deploying.
I want to specify batch size, language, etc

Topic		Replies	Views
Modelerror when deploying openchat3.5 Amazon SageMaker	0	223	April 2, 2024
Keep getting error '400' status code Amazon SageMaker	0	369	February 29, 2024
Cannot invoke sagemaker endpoint, keep getting OS error Amazon SageMaker	3	2843	February 2, 2024
Getting ModelError when trying to interact with deployed fine-tuned (LoRA/PEFT) model via Amazon API Gateway and AWS Lambda Amazon SageMaker	3	1672	July 21, 2023
Fairseq MMS HuggingFace model deployment Amazon SageMaker	1	743	November 23, 2023

Deploying Open AI's whisper on Sagemaker

deploy the endpoint endpoint

Related topics