A model I can use with video data?

OmerKuru · June 20, 2025, 2:44pm

Hello everyone. I’ve been exploring and researching VLM models for a while now. However, there are so many purpose-built and specialized models that I’m a bit confused, so I decided to write here. I built a model for quality control of a material. However, this model is a CNN model composed of images. My goal is to use VLM to control the same concept with video input. Our objective is to create a model that can detect defects in the coating surface of a microscope image that has been magnified multiple times. Is there a VLM model you could recommend for this purpose?

John6666 · June 20, 2025, 2:55pm

For video interpretation, these models may be easier to fine-tune. For still images, there are countless other options.

In addition, super-resolution and upscaling might be used for pre-processing restored enlarged photos, but there is a risk of generating information that does not originally exist in the image during the interpolation process.

github.com/huggingface/diffusers

Add Ultimate SD Upscale pipeline for high-quality tiled image upscaling

opened 01:36PM - 22 Oct 24 UTC

BasimBashir

**Is your feature request related to a problem? Please describe.** Currently, d…iffusers library lacks advanced tiled upscaling capabilities that are available in other Stable Diffusion implementations. While the library supports basic img2img and upscaling, there's no built-in solution for handling large images through intelligent tiling and seam fixing. This makes it difficult to process high-resolution images while maintaining quality and managing memory efficiently. **Describe the solution you'd like** Implement Ultimate SD Upscale functionality (similar to Automatic1111's WebUI extension) as a pipeline in diffusers. Key features should include: 1. Progressive upscaling with intelligent scale factor determination 2. Multiple tiling modes: - Linear processing - Chess pattern processing 3. Advanced seam fixing options: - Band pass mode - Half tile offset - Half tile with intersections 4. Configurable parameters: - Tile sizes - Padding - Mask blur - Denoise strength for seam fixing The implementation should integrate smoothly with existing diffusers pipelines and maintain the library's user-friendly API style. **Describe alternatives you've considered** - Using basic tiling without seam fixing (leads to visible artifacts) - Running multiple separate upscale passes (inefficient and lower quality) - Implementing as a separate package (loses benefits of diffusers' optimization and integration) - Using other libraries like PIL or cv2 for tiling (lacks SD-specific optimizations) **Additional context** - Reference implementation: https://github.com/Coyote-A/ultimate-upscale-for-automatic1111 - This feature would be particularly valuable for: - Professional image upscaling - Batch processing of large images - Creating high-resolution outputs while managing VRAM - Maintaining image quality in tiled processing - Could potentially be implemented as either a standalone pipeline or an enhancement to existing img2img pipelines - Would complement existing super-resolution models in the library

OmerKuru · June 22, 2025, 5:12pm

hey John, thanks a lot for your answers. I will check theese models.

Topic		Replies	Views
Image to Video testing Beginners	2	46	June 7, 2025
Image to video - many images as input Models	1	23	July 9, 2025
VQModel usage issues 🧨 Diffusers	0	401	October 20, 2023
Regarding the House Material Change Beginners	4	35	June 13, 2025
If I want to find an app/model that does "inpainting" how do I search? Beginners	2	26	April 26, 2025

A model I can use with video data?

Related topics