Cuda memory error on unchanged workshop 1 notebooks

I am running notebooks 1 and 3 unchanged from huggingface-sagemaker-workshop-series/workshop_1_getting_started_with_amazon_sagemaker at main · philschmid/huggingface-sagemaker-workshop-series · GitHub

And I get the following error:

RuntimeError: CUDA out of memory. Tried to allocate 192.00 MiB (GPU 0; 15.78 GiB total capacity; 14.80 GiB already allocated; 44.75 MiB free; 14.83 GiB reserved in total by PyTorch)

I am trying with different batch sizes and learning rates, but can someone help me understand why not everyone got the same error if we’re all using the same AWS resources?

Hello @kjackson,

I run the notebook now twice as it on “main” and never got any error.

2021-12-01 08:46:23 Uploading - Uploading generated training model
2021-12-01 08:48:23 Completed - Training job completed
ProfilerReport-1638347816: NoIssuesFound
Training seconds: 490
Billable seconds: 490