If the aim is classification rather than content analysis, then perhaps the strictness of every single word is not required to that extent, so I think Qwen is fine. Well, it’s also good in terms of performance, but it excels in terms of multilingual performance. If you want something smaller, there is also SmolVLM2, but it may be too small.
I think it would be good to extract text using a script (there are libraries that extract text from PDFs, etc.) or VLM, and then classify the text using BERT or LLM.
If you can accomplish this using BERT or a derivative model, it will be the most cost-effective. LLM naturally has high classification performance, but it’s also large…