This HuggingFace discussion says hxxps://discuss.huggingface.co/t/can-text-to-image-models-be-deployed-to-a-sagemaker-endpoint/20120 that an inference.py need to be created. I don’t know what the Llava Llama has though. I tried to look at the files of the model, but I don’t see relevant meta data about this.
This StackOverflow entry hxxps://stackoverflow.com/questions/76197446/how-to-do-model-inference-on-a-multimodal-model-from-hugginface-using-sagemaker is about a serverless deployment case, but it uses a custom TextImageSerializer serializer. Shoudl I try to use something like that?
My Stackoverflow entry: hxxps://stackoverflow.com/questions/77193088/how-to-perform-an-inference-on-a-llava-llama-model-deployed-to-sagemake-from-hug
Reddit: hxxps://www.reddit.com/r/LocalLLaMA/comments/16pzn88/how_to_parametrize_a_llava_llama_model/