Hello everyone. I’ve been exploring and researching VLM models for a while now. However, there are so many purpose-built and specialized models that I’m a bit confused, so I decided to write here. I built a model for quality control of a material. However, this model is a CNN model composed of images. My goal is to use VLM to control the same concept with video input. Our objective is to create a model that can detect defects in the coating surface of a microscope image that has been magnified multiple times. Is there a VLM model you could recommend for this purpose?
For video interpretation, these models may be easier to fine-tune. For still images, there are countless other options.
In addition, super-resolution and upscaling might be used for pre-processing restored enlarged photos, but there is a risk of generating information that does not originally exist in the image during the interpolation process.