Some issues when training model on Sagemaker

ivanlau · November 23, 2021, 3:14pm

Hello world,
I’m getting two issues when I fine-tuning on my model using this sagemaker notebook.

No GUI login prompt out when running notebook_login(), instead I’m getting this:

error2991×230 16.6 KB

As a workaround, I’m using hardcoded token
Hit ResourceLimitExceeded Error when running huggingface_estimater.fit(…):

error984×607 36.8 KB

For item 2, 1. I have opened an issue on AWS support to request for increasing the limit but I will expect a slow reply from them. Is there any other way to get around this while getting GPU boost from Sagemaker?

FYI, I’m using my own AWS account (Free Tier account but having some credits).

Thanks.

philschmid · November 23, 2021, 3:51pm

Hello @ivanlau,

thanks for opening the thread.

To 1. where are you running the sagemaker notebook?

To 2. I think you can go with the ml.g4dn.xlarge it also has 1 GPU and shouldn’t need a limit increase for that.

ivanlau · November 23, 2021, 4:00pm

@philschmid

To 1:
I’m running it on ml.t3.medium instance. Open it using JupyterLab environement (conda_pytorchp36)

To 2:
I changed it to your suggested instance but still same error:

ivanlau · November 23, 2021, 4:01pm

settings:

philschmid · November 23, 2021, 4:06pm

Can you test the Jupyter Noteboook?

To 2. Okay then i guess to need to open a Support ticket. you can do this from AWS Console with service quota

ivanlau · November 24, 2021, 11:59am

@philschmid
Hi,
I have tested on Jupyter notebook. and ya, It’s working fine over there. It seems that JupyterLab ipywidgets is disabled by default or outdated? I not sure. but anyways, I can continue working on.

For 2, Yes I already opened ticket and following up with them.

Anyways, thanks for the help and your notebook too.
This is my first time doing ML on the cloud. Learnt a lot.

Topic		Replies	Views
Sagemaker Serverless Inference Amazon SageMaker	22	8995	May 22, 2024
Issues using GPU with HuggingFace (TensorFlow) model deployed to SageMaker endpoint Amazon SageMaker	0	617	December 12, 2023
Sagemaker Endpoint Not Using GPU for PygmalionAI Amazon SageMaker	7	1798	April 18, 2024
CPU/Memory Utilization Too High When Running Inference on Falcon 40B Instruct Amazon SageMaker	4	1573	August 31, 2023
CUDA error for inference on GPU instance Amazon SageMaker	2	759	May 16, 2023

Some issues when training model on Sagemaker

Related topics