Serveless memory problem when deploy Wav2Vec2 with custom inference code

philschmid · May 26, 2022, 8:03am

Are you providing the model via s3 uri or hub configuration. You have to provide it via s3://path as model_data since the hub configuration is not loading the kenlm model.

diegoseto · May 26, 2022, 7:09pm

I’m providing the model via s3

from sagemaker.huggingface.model import HuggingFaceModel
from sagemaker.serializers import DataSerializer

# create Hugging Face Model Class
huggingface_model = HuggingFaceModel(
	transformers_version='4.17.0',
	pytorch_version='1.10.2',
	py_version='py38',
	model_data='s3://sagemaker-us-east-2-094463604469/model.tar.gz',
	role=role, 
)

philschmid · May 27, 2022, 6:41am

You might also need to provide a requirements.txt to install transformers==4.19.2 not sure if the LM-boosted decoding was available in 4.17

diegoseto · May 27, 2022, 10:03pm

Even after add transformers==4.19.2 as dependency in requirements.txt i got the same problem. I’m using transformers 4.17 in my local machine and works fine, only when deployed in sagemaker i found problems.

Topic		Replies	Views
Sagemaker Serverless Inference Amazon SageMaker	22	8996	May 22, 2024
Deploying custom inference script with llama2 finetuned model Amazon SageMaker	6	1241	January 4, 2024
Serverless deploy troubles Amazon SageMaker	5	1447	May 16, 2022
Getting CUDA memory error at endpoint - what are my options? Amazon SageMaker	5	3282	May 20, 2022
Payload too large for Async Inference on Sagemaker Amazon SageMaker	8	2385	June 9, 2023

Serveless memory problem when deploy Wav2Vec2 with custom inference code

Related topics