Which VLM is best for defect detection in images

llmdave · November 6, 2024, 10:09pm

I have been using LLMs for 20 months full time, but am new to Vision Language Models.

I have about 10,000 high res images (48mpx) of structures I want to detect defects in. The textures in the images vary and can be grouped into about 12 categories. I figure a 2 stage approach will help. I feel I might have to break up my 48mpx images into smaller tiles for the model to cope with the no. of tokens. Then the issue of overlap to detect defects that span the tiles comes into play. 1st stage use a PEFT fine tuned VLM to classify the image into one of 12 categories on texture type. Stage 2 have several PEFT fine tuned for 1 or a couple of the different category images to detect the defects. This is likely to be run on batches of images overnight, so token/sec is not too important.

I am looking for suggestions about model selection and also critique of my approach above. References to research papers relevant are most welcome.

Thank you!

Topic		Replies	Views
A model I can use with video data? Models	2	54	June 22, 2025
Multi modal LLM fine tuning (image annotation) Beginners	1	497	October 8, 2024
Seeking Recommendations for an AI Model to Evaluate Photo Damage for Restoration Project Models	1	141	January 21, 2025
Mandatory Fundoscopy and Advanced AI Model for Medical Consultation Beginners	0	105	March 20, 2024
Optimal Approach for Fine-Tuning LayoutLMv3 for Token Classification with 80 Labels Models	3	30	May 26, 2025

Which VLM is best for defect detection in images

Related topics