Serveless memory problem when deploy Wav2Vec2 with custom inference code

diegoseto · May 16, 2022, 10:12pm

Hello,

I’m facing issues with Wav2Vec2 deployment in Amazon SageMaker using the serveless option, but only when i’m using a custom inference script (passing the path of the model.tar.gz located at Amazon S3 Bucket). I’m receiving the following memory error:

"ModelError: An error occurred (ModelError) when calling the InvokeEndpoint operation: “Inference failed due to insufficient memory on the Endpoint. Please add more memory to the endpoint.”

Serveless inference config code(works fine when using the model hosted on the HF Hub):

serverless_config = ServerlessInferenceConfig(memory_size_in_mb=4096, max_concurrency=10)

Custom inference script code:

import json
import torch
from transformers import pipeline

def model_fn(model_dir):

    pipe = pipeline('automatic-speech-recognition', model_dir, chunk_length_s = 10)
    
    return pipe

def input_fn(json_request_data, content_type='application/json'):  
    
    input_data = json.loads(json_request_data)

    return input_data

def predict_fn(input_data, pipe):
    
    result = pipe(input_data)

    return result
    
def output_fn(transcript, accept='application/json'):

    return json.dumps(transcript), accept

Can anyone help me?

philschmid · May 17, 2022, 1:29pm

Could you try a higher memory size configuration? 6192? Which model are you testing?

diegoseto · May 17, 2022, 3:28pm

Hi @philschmid

I already tried with this configuration but i got the same error. I’m testing a wav2vec2-large-xlsr model (private)

philschmid · May 18, 2022, 1:12pm

@diegoseto is there a particular reason why you are creating a inference.py script? You can directly provide your HF_API_TOKEN in the hub configuration next to you model id and task. See HF_API_TOKEN

diegoseto · May 18, 2022, 3:14pm

@philschmid yes, the reason is that i want to use a language model and configure the “chunk_length_s” pipeline parameter, but there is no option in Amazon Sagemaker library without creating a custom inference script (at least I didn’t find)

marshmellow77 · May 18, 2022, 4:57pm

Hi Diego

When using a custom inference script you are leveraging the SageMaker Hugging Face Inference Toolkit. Now the cool part is that this toolkit actually using the pipelines API in the background, see here.

What that means for you is that you actually don’t have to write an inference script. Instead you can provide additional parameters when calling the endpoint, like so (this is an example for text generation, but the same principle applies in your case):

prompt = st.text_area("Enter your prompt here:")

params = {"return_full_text": True,
          "temperature": temp,
          "min_length": 50,
          "max_length": 100,
          "do_sample": True,
          "repetition_penalty": rep_penalty,
          "top_k": 20,
          }

payload = {"inputs": prompt, "parameters": params}

response = sagemaker_runtime.invoke_endpoint(
    EndpointName=endpoint_name,
    ContentType='application/json',
    Body=json.dumps(payload)
)

Try it out and use the endpoint with the chunk_length_s parameter, this should work.

Hope that helps!

Cheers
Heiko

diegoseto · May 18, 2022, 5:23pm

Hi @marshmellow77

Cool! I will try this, thanks. What about the use of a language model in inference? There is another option?

marshmellow77 · May 18, 2022, 10:34pm

You mean other than serverless? Yes, there’s actually 4 different inference options on Sagemaker. @philschmid just released a blog post comparing the different options: https://www.philschmid.de/sagemaker-inference-comparison

diegoseto · May 19, 2022, 3:34pm

No, i mean the use of a language model to boosting wav2vec2 decoding as described by @patrickvonplaten here How to create Wav2Vec2 With Language model, but in Amazon Sagemaker (serveless). In this topic @philschmid suggested using custom inference script, but i’m having problems as mentioned above.

There is another option to use a language model without a custom inference script?

philschmid · May 20, 2022, 7:54am

What is the model size of your custom model? and also how are you creating the model.tar.gz ? I might be possible the zip size or model size caused the issue.

diegoseto · May 20, 2022, 4:07pm

pytorch_model.bin size in gb: 1.17GB
model.tar.gz size in gb: 1.08GB
Number of parameters: 315438720

I’m creating the model.tar.gz using the following command:

tar zcvf model.tar.gz *

philschmid · May 20, 2022, 5:10pm

@diegoseto i created a whole e2e example using jonatasgrosman/wav2vec2-large-xlsr-53-english and didn’t see any error, same model size and everything. (i removed the language model folder to have a average folder size)

gist.github.com

https://gist.github.com/philschmid/bef900bf481929613df0b5dbe9484d5b

example.ipynb

{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Development Environment and Permissions\n",
    "\n",
    "### Installation \n"
   ]

This file has been truncated. show original

diegoseto · May 20, 2022, 6:21pm

Works perfectly fine now only overwriting “model_fn” function, i probably made some mistake overwriting other function . Thank you very much for your help and your time

diegoseto · May 20, 2022, 10:23pm

@philschmid another question, if you could help me. I’m running the model in my local machine with a language model setted. My directory structure:

Locally (loading the model using the pipeline object), the language model works fine in the inference, but when deployed to SageMaker apparently he is not making use of the LM (i’m comparing the inference results). Everything is the same than locally, the pipeline, the model and the transformers version (4.17.0).

Did I forget something?

philschmid · May 23, 2022, 11:36am

How did you set up your local env? Did you install additional dependencies? Have you installed KenLM following theses steps for you local env: Boosting Wav2Vec2 with n-grams in 🤗 Transformers?
I think KenLM is not yet available in the DLC

diegoseto · May 23, 2022, 7:45pm

hi @philschmid

Yes, i followed the steps in that article you mentioned. If Kenlm is not available in the DLC, the other way is overwrite the predict_fn function in custom inference script, right? If yes, do you have any examples for Wav2Vec2 (like the other script you made overwriting only the model_fn)?

Thanks

philschmid · May 23, 2022, 7:58pm

I think this wouldn’t solve the missing KenLM model. What you could do is use os.system('install kenlm') at the top of your inference.py to install it on start up (needs to finish under 2 min/ i am not sure what the behavior is for serverless)

diegoseto · May 23, 2022, 8:32pm

I tried this but i’m still getting the same result.

inference.py:

import os
from transformers import pipeline

os.system('install kenlm')

def model_fn(model_dir):

    pipe = pipeline('automatic-speech-recognition', model_dir, chunk_length_s = 10)
    
    return pipe

philschmid · May 24, 2022, 6:46am

@diegoseto with os.system('install kenlm') i meant adding the steps to install kenlm

diegoseto · May 25, 2022, 7:58pm

hi @philschmid

I tried to install kenlm following the steps of the article using the os.system and the commands seem to work fine but i got this error when predict:

ModelError: An error occurred (ModelError) when calling the InvokeEndpoint operation: Received client error (400) from model with message "{
  "code": 400,
  "type": "InternalServerException",
  "message": "module kenlm has no attribute Model"
}
".

My inference.py:

import os
from transformers import pipeline

os.system('sudo apt install build-essential cmake libboost-system-dev libboost-thread-dev libboost-program-options-dev libboost-test-dev libeigen3-dev zlib1g-dev libbz2-dev liblzma-dev')

os.system('wget -O - https://kheafield.com/code/kenlm.tar.gz | tar xz')

os.system('mkdir kenlm/build && cd kenlm/build && cmake .. && make -j2')

def model_fn(model_dir):

    pipe = pipeline('automatic-speech-recognition', model_dir, chunk_length_s = 10)
    
    return pipe

I tried to install kenlm module via requirements.txt too, but i got other error:

UnexpectedStatusException: Error hosting endpoint huggingface-pytorch-inference-2022-05-25-19-13-03-317: Failed. Reason: Received server error (0) from model with message "An error occurred while handling request as the model process exited.". See https://us-east-2.console.aws.amazon.com/cloudwatch/home?region=us-east-2#logEventViewer:group=/aws/sagemaker/Endpoints/huggingface-pytorch-inference-2022-05-25-19-13-03-317 in account 094463604469 for more information..

Checking the logs looks i’m receiving a permission denied when use the src directory (created by kenlm module setup)

OpenBLAS WARNING - could not determine the L2 cache size on this system, assuming 256k
python: can't open file '/usr/local/bin/deep_learning_container.py': [Errno 13] Permission denied
OpenBLAS WARNING - could not determine the L2 cache size on this system, assuming 256k
Defaulting to user installation because normal site-packages is not writeable
Obtaining kenlm from git+https://github.com/kpu/kenlm@master#egg=kenlm (from -r /opt/ml/model/code/requirements.txt (line 1))
ERROR: Could not install packages due to an OSError: [Errno 13] Permission denied: '/src'
Check the permissions.
WARNING: There was an error checking the latest version of pip.
2022-05-25 19:15:11,902 - sagemaker-inference - ERROR - failed to install required packages, exiting
Traceback (most recent call last):
  File "/opt/conda/lib/python3.8/site-packages/sagemaker_inference/model_server.py", line 189, in _install_requirements
    subprocess.check_call(pip_install_cmd)
  File "/opt/conda/lib/python3.8/subprocess.py", line 364, in check_call
    raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['/opt/conda/bin/python', '-m', 'pip', 'install', '-r', '/opt/ml/model/code/requirements.txt']' returned non-zero exit status 1.
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
  File "/usr/local/bin/dockerd-entrypoint.py", line 23, in <module>
    serving.main()
  File "/opt/conda/lib/python3.8/site-packages/sagemaker_huggingface_inference_toolkit/serving.py", line 34, in main
    _start_mms()
  File "/opt/conda/lib/python3.8/site-packages/retrying.py", line 49, in wrapped_f
    return Retrying(*dargs, **dkw).call(f, *args, **kw)
  File "/opt/conda/lib/python3.8/site-packages/retrying.py", line 206, in call
    return attempt.get(self._wrap_exception)
  File "/opt/conda/lib/python3.8/site-packages/retrying.py", line 247, in get
    six.reraise(self.value[0], self.value[1], self.value[2])
  File "/opt/conda/lib/python3.8/site-packages/six.py", line 719, in reraise
    raise value
  File "/opt/conda/lib/python3.8/site-packages/retrying.py", line 200, in call
    attempt = Attempt(fn(*args, **kwargs), attempt_number, False)
  File "/opt/conda/lib/python3.8/site-packages/sagemaker_huggingface_inference_toolkit/serving.py", line 30, in _start_mms
    mms_model_server.start_model_server(handler_service=HANDLER_SERVICE)
  File "/opt/conda/lib/python3.8/site-packages/sagemaker_huggingface_inference_toolkit/mms_model_server.py", line 91, in start_model_server
    _install_requirements()
  File "/opt/conda/lib/python3.8/site-packages/sagemaker_inference/model_server.py", line 192, in _install_requirements
    raise ValueError("failed to install required packages")
ValueError: failed to install required packages

My requirements.txt (i tried to install via pip using os.system too):

-e git+https://github.com/kpu/kenlm@master#egg=kenlm

Topic		Replies	Views
Deploying Open AI's whisper on Sagemaker Amazon SageMaker	54	16186	April 12, 2024
How to deploy Whisper for other languages to Sagemaker? Amazon SageMaker	0	306	February 5, 2024
Sagemaker Serverless Inference Amazon SageMaker	22	9009	May 22, 2024
Inference failed for FLAN-UL2(20B) on SageMaker Amazon SageMaker	6	2161	April 4, 2023
How to create Wav2Vec2 With Language model 🤗Transformers	15	5981	May 5, 2023

Serveless memory problem when deploy Wav2Vec2 with custom inference code

Related topics