Deploying Open AI's whisper on Sagemaker

Hi, @yugaljain1999 .
I think it’s a limitation of whisper. Whisper only deal with audio length of 30 seconds as far as I understand.
One way is to customize inference.py to adjust that limitation with your own code.

1 Like

@sohoha is correct. The current Huggingface implementation of Whisper only supports 30 seconds of audio, although they are working on supporting longer files. See this issue and this PR.

1 Like

Ok @sohoha @thusken
So for now if we have audio length of more than 30 seconds then should we do chunking of original audio into 30 seconds and then apply huggingface whispe-tiny model right?

And I have one more query, can we get timestamp attributes also in output like we get output from openai/whisper model? For e.g. “no_speech_prob”, “text”, “token ids”, “temperature”, “seek” and “compression_ratio”, “avg_log_prob”, “start” and “end”?

Thanks

1 Like

I see that I pasted a wrong link in my previous answer, this is the PR that I wanted to link.

This PR is merged so it will probably be available in an upcoming version of the transformers library. Although for me it’s unclear whether this PR also adds timestamps.

I’m facing a similar problem when deploying the flan-t5-xl model, however I continue to get it despite adding the requirements.txt with the transformers==4.26.0 to the tar.

@marshmellow77 @thusken would be grateful for some help!

@rlekhwani-umass i am planning to create an example. I ll post it here once it is ready.

@rlekhwani-umass - I was able to deploy flan-t5-xxl to SM endpoint with this notebook

Thank you @marshmellow77.

I created a bit more in detailed version: Deploy FLAN-T5 XXL on Amazon SageMaker

1 Like

ugh … @philschmid is always one-upping me :stuck_out_tongue_winking_eye:

I tried with the dataserializer but still same error, could you paste your code @thusken?

I got:
We expect a numpy ndarray as input, got \u003cclass \u0027list\u0027\u003e"

Hi @rpinto! It’s been a while since I tried this, but my code for deployment and inference was roughly as follows:

from sagemaker.huggingface.model import HuggingFaceModel
from sagemaker.serializers import DataSerializer
import json

# create a serializer for the data
audio_serializer = DataSerializer(content_type='audio/x-audio') # using x-audio to support multiple audio formats

# create Hugging Face Model Class
huggingface_model = HuggingFaceModel(
    model_data=s3_location,       # path to your model and script
    role=role,                    # iam role with permissions to create an Endpoint
    transformers_version='4.17.0',
    pytorch_version='1.10.2',
    py_version='py38',           # python version used
    env={
        'HF_MODEL_ID':'openai/whisper-large',
        'HF_TASK':'automatic-speech-recognition'
    }
)


# deploy the endpoint endpoint
predictor = huggingface_model.deploy(
    initial_instance_count=1,
    instance_type="ml.p2.xlarge",
    serializer=audio_serializer
    )


audio_path = "path-to-an-audio-file.mp3"
result = predictor.predict(data=audio_path)

The rest of the deployment steps are as discussed earlier in the thread, for example in this reply from @sohoha Deploying Open AI's whisper on Sagemaker - #12 by sohoha

1 Like

@thusken - thank you very much for your support. One question: how can I obtain a “predictor” instance without deploying the model again?

Typically you will call the Sagemaker endpoint using the invoke_endpoint method, e.g.

aws = boto3.Session(
    region_name="aws-region",
    aws_access_key_id="access-key-goes-here",
    aws_secret_access_key="secret-key-goes-here"
)
runtime = aws.client('runtime.sagemaker')
runtime.invoke_endpoint(
        EndpointName="endpoint_name",
        Body=audio_file
 )
1 Like

Thank you. I notice a lot of references to the “requirements.txt” in the post. Can you provide an example, please?

The requiements.txt is file with just one line specifying the transformers version you need. I would recommend reading this post from earlier in the thread.

1 Like

Re obtaininging the predictor without deploying again: You can just create a new instance of the HuggingFacePredictor class and provide the endpoint name: Hugging Face — sagemaker 2.135.0 documentation

2 Likes

Hi @marshmellow77 , thank you for your reply. The method provided by @thusken should also work? Can you provide an example, please?

Thank you,

Razvan

Yes, both methods will work. @thusken used boto3, which is a lower-level API and allows for more granular control. I use the Python SDK, which abstracts certain layers away from the user and is (for me, anyway) easier to use.

An example for this is straightforward:

from sagemaker.huggingface.model import HuggingFacePredictor
predictor = HuggingFacePredictor("<endpoint_name>")
1 Like

Hi @marshmellow77 , thank you again for the reply. I managed to make it work. However I am looking for a way to get rid of the 30 seconds limitation. Is there any way to do this? If this PR was merged, shouldn’t be the feature already available?

Thank you for this helpful post.
I tried following @thusken post #31, but I am still getting the following error.
Is this due to the transformers_version being 4.17.0 or something else? Any help will be appreciated.

ModelError: An error occurred (ModelError) when calling the InvokeEndpoint operation: Received client error (400) from primary with message "{
“code”: 400,
“type”: “InternalServerException”,
“message”: “\u0027whisper\u0027”
}