Image to text models tailored for web scraping?

lofti1 · June 9, 2024, 3:13am

Hello everyone👋! My task is to recognize text on typical layouts. These are screenshots of product webpages from a supplier’s site. I recognize product names, prices, and specifications. I am using GPT-4 vision, and it works great. On average, my token usage looks like this: Prompt token: 1000, Completion token: 300. I tried using a combination of OCR (Tesseract) + LLM (providing the LLM with the OCR-recognized text directly), but this didn’t significantly reduce costs (especially if the language is not English). My question is: In which direction should I experiment to make this process much more cost-effective without noticeable quality loss? I suspect that the most reliable option is to find an image-to-text model on Huggingface that works reasonably well out of the box and fine-tune it with my data. Or maybe there are existing models tailored for webscraping?

nielsr · June 9, 2024, 8:06pm

Hi,

Current recommendations that would work well for this use case include:

PaliGemma: PaliGemma
Idefics2: Idefics2
LLaVa, LLaVa-NeXT: LLaVa-NeXT - a llava-hf Collection

Tutorials regarding fine-tuning are available in my Transformers Tutorials repo, see e.g. Transformers-Tutorials/PaliGemma at master · NielsRogge/Transformers-Tutorials · GitHub for PaliGemma.

Next to that, some other powerful models which might work well:

CogVLM-2: THUDM/cogvlm2-llama3-chat-19B · Hugging Face
MiniCPM: openbmb/MiniCPM-Llama3-V-2_5 · Hugging Face

Topic		Replies	Views
Image to Text model that can take an additional text as input for context 🤗Hub	1	499	September 5, 2023
Web parsing in HuggingChat Intermediate	0	473	October 10, 2023
Inference Api free rate limit Inference Endpoints on the Hub	0	1931	May 20, 2023
Image to text model that can take an additional text input 🤗Transformers	1	285	October 2, 2023
Cost-Effective LLM for Extracting Web Selectors from E-Commerce HTML Models	0	124	February 17, 2025

Image to text models tailored for web scraping?

Related topics