Serveless memory problem when deploy Wav2Vec2 with custom inference code

marshmellow77 · May 18, 2022, 4:57pm

Hi Diego

When using a custom inference script you are leveraging the SageMaker Hugging Face Inference Toolkit. Now the cool part is that this toolkit actually using the pipelines API in the background, see here.

What that means for you is that you actually don’t have to write an inference script. Instead you can provide additional parameters when calling the endpoint, like so (this is an example for text generation, but the same principle applies in your case):

prompt = st.text_area("Enter your prompt here:")

params = {"return_full_text": True,
          "temperature": temp,
          "min_length": 50,
          "max_length": 100,
          "do_sample": True,
          "repetition_penalty": rep_penalty,
          "top_k": 20,
          }

payload = {"inputs": prompt, "parameters": params}

response = sagemaker_runtime.invoke_endpoint(
    EndpointName=endpoint_name,
    ContentType='application/json',
    Body=json.dumps(payload)
)

Try it out and use the endpoint with the chunk_length_s parameter, this should work.

Hope that helps!

Cheers
Heiko

Topic		Replies	Views
Deploying Open AI's whisper on Sagemaker Amazon SageMaker	54	16200	April 12, 2024
How to deploy Whisper for other languages to Sagemaker? Amazon SageMaker	0	307	February 5, 2024
Sagemaker Serverless Inference Amazon SageMaker	22	9021	May 22, 2024
Inference failed for FLAN-UL2(20B) on SageMaker Amazon SageMaker	6	2166	April 4, 2023
How to create Wav2Vec2 With Language model 🤗Transformers	15	5988	May 5, 2023

Serveless memory problem when deploy Wav2Vec2 with custom inference code

Related topics