I would like to know which large language model, or combination of large language models, does the best job of accurately extracting names, dates, locations, and languages from images. Does Hugging Face provide any LLMs that accomplish this objective?
1 Like
Look into Donut, TrOCR, Mistral, Llama
I would use Donut and feed into Llama and use a hybrid approach possibly
2 Likes
If you’re looking for something related to OCR, this recently released model might be also a good choice.