Failed pulling HF image on sagemaker as of mid-morning; edit now working

We have an existing sagemaker endpoint configuration that uses a model we trained with huggingface. It has been working for months and worked earlier this am. However after about 9am edt today we can no longer launch the endpoint. The error we get is a permissions error on the huggingface image.
The role ‘arniam::…’ does not have BatchGetImage permission for the image: ‘763104351884.dkr.ecr.us-east-1.amazonaws.com/huggingface-pytorch-inference:2.1-transformers4.37-gpu-py310-cu118-ubuntu20.04’.

I’m wondering if the image no longer exists or is no longer supported? We certainly haven’t seen any information that our existing endpoint configurations would stop working.

as of 2pm et this is now working again

1 Like