About the Amazon SageMaker category

Hi @philschmid, hope you are fine.
I have a model that is huge it requires 50GB of system Ram and after loading it to gpu it requires 14 Gb of gpu ram, and we will most likely go with g4dn.4xlarge which has 4 gpu’s and we will deploy one model on each gpu and make 1 api to interact with them.
So, is it possible to import huggingface pretraiend model in sagemaker, and then deploy it.

Thanks for any help.

Hey @m-ali-awan,

we are heavily working on an Inference Solution on SageMaker for Hugging Face to make it as easy as possible to deploy models, the current estimation for this to be released is early July so if you can wait a couple of weeks we will have a nice solution for it. If you cannot wait you could use the PyTorch implementation to deploy your model, but it is way more complicated and difficult.

Hi @philschmid , thanks alot, as you always respond to my queries. :relaxed:

It would be great if you can share with me the link of PyTorch deployment.

Further, my model is a conversion from jax to pytorch, and it is GPT-J.Now as you know that with AWS inferentia chips, the deployment becomes cheap, and more efficient.
So, is it possible for GPT-J to be compiled with neuron sdk.As there is an example for huggingface bert, but I am not sure for GPT.As I know that neuron compilation does not work for all kinds of models.

Thanks again, for all your help.

In terms of using inferentia, I am not sure, if compiling GPT-J works you need to test. You also need to be careful when compiling with neuron to use the right configuration.

Here is a Pytorch example that requires a lot of custom code GitHub - aws-samples/amazon-sagemaker-bert-pytorch.

Thanks, alot.

Hi @philschmid , hope you are fine.
Is there any discord channel for huggingface?
If so kindly share me the invite…
Thanks…