I want to use a 7b llava model with huggingface but I can’t really find any docs to use it? Any help would be great
I deployed a model to SageMaker with the SageMaker deployment card HF provides. Currently this model: hxxps://huggingface.co/liuhaotian/llava-llama-2-13b-chat-lightning-preview/discussions/3
However one of my concerns is that the card states
'HF_TASK': 'text-generation' whereas Llava Llama is rather a text to image / image ‘question-answering’ type of model.
This topic states transformers need tinkering: hxxps://discuss.huggingface.co/t/can-text-to-image-models-be-deployed-to-a-sagemaker-endpoint/20120
So I still haven’t got it working. Plus I didn’t have enough quota on AWS to deploy it in a half decent box with GPU so it’ll be another question if the box can carry its weight at all. I’m surprised noone helped so far to me neither in HF model discussions, GitHub discussions (hxxps://github.com/haotian-liu/LLaVA/discussions/454) or other forums.
This HuggingFace discussion says hxxps://discuss.huggingface.co/t/can-text-to-image-models-be-deployed-to-a-sagemaker-endpoint/20120 that an inference.py need to be created. I don’t know what the Llava Llama has though. I tried to look at the files of the model, but I don’t see relevant meta data about this.
This StackOverflow entry hxxps://stackoverflow.com/questions/76197446/how-to-do-model-inference-on-a-multimodal-model-from-hugginface-using-sagemaker is about a serverless deployment case, but it uses a custom TextImageSerializer serializer. Shoudl I try to use something like that?
My Stackoverflow entry: hxxps://stackoverflow.com/questions/77193088/how-to-perform-an-inference-on-a-llava-llama-model-deployed-to-sagemake-from-hug