Which model to select

If I assume that the document is an image, it would be around here.
If it’s text information, I think Qwen 2.5 0.5B or SmoLLM would work.