Using S3 as model cache for Huggingface LLM inference DLC on Sagemaker


Is it possible to use the Huggingface LLM inference container for Sagemaker (Introducing the Hugging Face LLM Inference Container for Amazon SageMaker) in a way that I can specify path to a S3 bucket where I have the models downloaded ready for use instead of downloading the models from internet. Essentially using the S3 path as a HF_HUB cache or using the S3 path to download the models on to the local container.

This is useful in the cases

  • where we can’t connect to internet
  • when we have fine-tuned models stored on S3

Thank you!

We release a blog post on how to do this: Securely deploy LLMs inside VPCs with Hugging Face and Amazon SageMaker