I need some recommendation or advice on a fast vqa (visual question answering) model. I really don't know how to look for them

Hi everyone! I have a local project on my laptop with a rtx 3060.
I am capturing image from a camera and I am analyzing it with a 2b image-text-to-text model, it is accurate enough but a bit slow, and I think that with a vqa I could improve the efficiency, but I don’t know what metric to look for to know if it is a fast model, any recommendation, or is there a better alternative for my problem?
thanks.

1 Like