Deploying Open AI's whisper on Sagemaker

sohoha · November 29, 2022, 11:19pm

Hi, @yugaljain1999 .
I think it’s a limitation of whisper. Whisper only deal with audio length of 30 seconds as far as I understand.
One way is to customize inference.py to adjust that limitation with your own code.

thusken · November 30, 2022, 7:59am

@sohoha is correct. The current Huggingface implementation of Whisper only supports 30 seconds of audio, although they are working on supporting longer files. See this issue and this PR.

yugaljain1999 · November 30, 2022, 9:06am

Ok @sohoha @thusken
So for now if we have audio length of more than 30 seconds then should we do chunking of original audio into 30 seconds and then apply huggingface whispe-tiny model right?

And I have one more query, can we get timestamp attributes also in output like we get output from openai/whisper model? For e.g. “no_speech_prob”, “text”, “token ids”, “temperature”, “seek” and “compression_ratio”, “avg_log_prob”, “start” and “end”?

Thanks

thusken · December 1, 2022, 12:46pm

I see that I pasted a wrong link in my previous answer, this is the PR that I wanted to link.

This PR is merged so it will probably be available in an upcoming version of the transformers library. Although for me it’s unclear whether this PR also adds timestamps.

rlekhwani-umass · February 8, 2023, 11:09am

I’m facing a similar problem when deploying the flan-t5-xl model, however I continue to get it despite adding the requirements.txt with the transformers==4.26.0 to the tar.

@marshmellow77 @thusken would be grateful for some help!

philschmid · February 8, 2023, 12:41pm

@rlekhwani-umass i am planning to create an example. I ll post it here once it is ready.

marshmellow77 · February 8, 2023, 5:28pm

@rlekhwani-umass - I was able to deploy flan-t5-xxl to SM endpoint with this notebook

philschmid · February 8, 2023, 5:32pm

Thank you @marshmellow77.

I created a bit more in detailed version: Deploy FLAN-T5 XXL on Amazon SageMaker

marshmellow77 · February 9, 2023, 7:14am

ugh … @philschmid is always one-upping me

rpinto · February 20, 2023, 8:00pm

I tried with the dataserializer but still same error, could you paste your code @thusken?

I got:
We expect a numpy ndarray as input, got \u003cclass \u0027list\u0027\u003e"

thusken · February 21, 2023, 1:45pm

Hi @rpinto! It’s been a while since I tried this, but my code for deployment and inference was roughly as follows:

from sagemaker.huggingface.model import HuggingFaceModel
from sagemaker.serializers import DataSerializer
import json

# create a serializer for the data
audio_serializer = DataSerializer(content_type='audio/x-audio') # using x-audio to support multiple audio formats

# create Hugging Face Model Class
huggingface_model = HuggingFaceModel(
    model_data=s3_location,       # path to your model and script
    role=role,                    # iam role with permissions to create an Endpoint
    transformers_version='4.17.0',
    pytorch_version='1.10.2',
    py_version='py38',           # python version used
    env={
        'HF_MODEL_ID':'openai/whisper-large',
        'HF_TASK':'automatic-speech-recognition'
    }
)


# deploy the endpoint endpoint
predictor = huggingface_model.deploy(
    initial_instance_count=1,
    instance_type="ml.p2.xlarge",
    serializer=audio_serializer
    )


audio_path = "path-to-an-audio-file.mp3"
result = predictor.predict(data=audio_path)

The rest of the deployment steps are as discussed earlier in the thread, for example in this reply from @sohoha Deploying Open AI's whisper on Sagemaker - #12 by sohoha

razvanp · February 23, 2023, 12:32pm

@thusken - thank you very much for your support. One question: how can I obtain a “predictor” instance without deploying the model again?

thusken · February 23, 2023, 12:54pm

Typically you will call the Sagemaker endpoint using the invoke_endpoint method, e.g.

aws = boto3.Session(
    region_name="aws-region",
    aws_access_key_id="access-key-goes-here",
    aws_secret_access_key="secret-key-goes-here"
)
runtime = aws.client('runtime.sagemaker')
runtime.invoke_endpoint(
        EndpointName="endpoint_name",
        Body=audio_file
 )

razvanp · February 23, 2023, 12:58pm

Thank you. I notice a lot of references to the “requirements.txt” in the post. Can you provide an example, please?

thusken · February 23, 2023, 1:01pm

The requiements.txt is file with just one line specifying the transformers version you need. I would recommend reading this post from earlier in the thread.

marshmellow77 · February 25, 2023, 7:34am

Re obtaininging the predictor without deploying again: You can just create a new instance of the HuggingFacePredictor class and provide the endpoint name: Hugging Face — sagemaker 2.135.0 documentation

razvanp · February 25, 2023, 9:34am

Hi @marshmellow77 , thank you for your reply. The method provided by @thusken should also work? Can you provide an example, please?

Thank you,

Razvan

marshmellow77 · February 25, 2023, 3:44pm

Yes, both methods will work. @thusken used boto3, which is a lower-level API and allows for more granular control. I use the Python SDK, which abstracts certain layers away from the user and is (for me, anyway) easier to use.

An example for this is straightforward:

from sagemaker.huggingface.model import HuggingFacePredictor
predictor = HuggingFacePredictor("<endpoint_name>")

razvanp · February 26, 2023, 2:25pm

Hi @marshmellow77 , thank you again for the reply. I managed to make it work. However I am looking for a way to get rid of the 30 seconds limitation. Is there any way to do this? If this PR was merged, shouldn’t be the feature already available?

thuangster · February 26, 2023, 6:03pm

Thank you for this helpful post.
I tried following @thusken post #31, but I am still getting the following error.
Is this due to the transformers_version being 4.17.0 or something else? Any help will be appreciated.

ModelError: An error occurred (ModelError) when calling the InvokeEndpoint operation: Received client error (400) from primary with message "{
“code”: 400,
“type”: “InternalServerException”,
“message”: “\u0027whisper\u0027”
}

Topic		Replies	Views
Modelerror when deploying openchat3.5 Amazon SageMaker	0	223	April 2, 2024
Keep getting error '400' status code Amazon SageMaker	0	369	February 29, 2024
Cannot invoke sagemaker endpoint, keep getting OS error Amazon SageMaker	3	2839	February 2, 2024
Getting ModelError when trying to interact with deployed fine-tuned (LoRA/PEFT) model via Amazon API Gateway and AWS Lambda Amazon SageMaker	3	1671	July 21, 2023
Fairseq MMS HuggingFace model deployment Amazon SageMaker	1	743	November 23, 2023

Deploying Open AI's whisper on Sagemaker

Related topics