I guess that there may be a good model for the following purpose. I have around 140 scanned PDFs with paper questionnaires. People have checked certain boxes or values on a likert scale. There is only one text box. I will scan these papers, but I asked myself if there maybe a model here that can be used out of the box to evaluate those scans. As I only have these 140 questionnaires, there is not enough data for a model training, I guess. Is there something so sophisticated to be used already?
No, there might be, but it’s more likely that I just don’t know about it.
Also, if you don’t mind using images instead of PDFs, it would be much easier. If you use a general-purpose VLM of a certain size, you can ask questions with text attached to the images, and they will answer. I think it’s possible with 8B or less. The larger the size, the higher the accuracy, but the operating environment becomes more demanding.
I can’t post a link, so try searching for “VL” in Spaces…
Hello everyone
A few days have passed and I have tried several of the models from the VL leaderboard that I found via spaces. However, I got none of them to work. Perhaps because I only have a laptop with a CPU. Do you know a good VQA model, as supposed here with 8B or so, that could also be run on a laptop for inference?
Best regards
Martin
You’ll need about 20GB of RAM to use the 8B model. Ideally, you’d want VRAM, but that’s not usually available on laptops.
You might be able to get by with 2B or 3B, but it’s pretty slow on a CPU.
You can use the Serverless Inference API for free up to 1000 times a day, so why not give it a try?