Sagemaker VQA Models (Donut)

BaileyQin · August 8, 2023, 3:29am

I have followed this wonderful tutorial by @philschmid for using and finetuning the donut model for document understanding. (Thank you so much for the tutorial!)
https://www.philschmid.de/sagemaker-donut

I am trying to reproduce this for the VQA version of donut:
https://huggingface.co/naver-clova-ix/donut-base-finetuned-docvqa

My first step was to deploy this base model on sagemaker to see if it works, but I am having some trouble. I am using the default “deploy to Amazon Sagemaker” code, provided in the link, but i changed the 'HF_TASK':'document-question-answering' to 'HF_TASK':'visual-question-answering'

The endpoint did end up successfully spinning up, but I seem to be having some trouble, especially with feeding the model both image data and the question data. In the example from Philipp, an image seralizer was used so that he can directly feed the raw bytes of the image data into the endpoint to get a result. However, my issue is that my input requires both an image and some text (the question). Some things I tried:

I tried just using json, however the PIL image is not json serializable.
I tried converting the PIL image to a numpy array then a list, but it seems to run into a size issue. (this is for a single image, not batch)

On a side note, I noticed it also does take image URLs. I tried to use a URL, but also got the following error: "\u0027str\u0027 object is not callable".

My first goal would be to get this base model running and to be able to send both local image data and the question to it for it to answer questions.

Any help would be greatly appreciated! Thanks

Topic		Replies	Views
InternalServerException when running a model loaded on S3 Amazon SageMaker	4	994	August 6, 2021
Can text-to-image models be deployed to a SageMaker endpoint? Amazon SageMaker	1	2030	July 8, 2022
Custom Inference.py script for Vision Transformer Amazon SageMaker	2	1570	December 9, 2022
How to make an inference for HuggingFaceModel of type 'image-to-text' Amazon SageMaker	0	519	January 27, 2024
Segment Anything Model (SAM) inference Amazon SageMaker	3	1917	October 26, 2023

Sagemaker VQA Models (Donut)

Related topics