Question answeirng Fine tuning

There are multiple ways to solve this, as you’re working with invoices I’d assume a vision-language model to perform better than a text-only one.

See our blog post on document AI for an overview: Accelerating Document AI. Models like LayoutLM are better than text-only models like DistilBERT.

Nowadays there are also a lot of generative document AI models including PaliGemma, Idefics2, LLaVa,… besides Donut, Pix2Struct, UDOP.

You can find demo notebooks for all of those here: GitHub - NielsRogge/Transformers-Tutorials: This repository contains demos I made with the Transformers library by HuggingFace..

Another option is to fine-tune a text-only LLM on OCR-ed text as I explained here: Fine tune LLMs on PDF Documents - #9 by nielsr

1 Like