A model for converting photos to videos with 8GB of VRAM

Hello,

I use ComfyUI. My graphics card model is NVIDIA GeForce RTX 5060 Ti with 8GB VRAM. Which model can I use to convert photos to videos?

Thank you.

1 Like

Try Wan…

Hello,

Thank you so much for your reply.

1- Is Wan’s model a photo of a man playing the guitar?

2- In the video you introduced, the model requires Pytorch 2.7, but ComfyUI uses a higher version of Pytorch. Doesn’t this cause interference?

Hello,
I followed the steps at How to Run Wan2.2 Image to Video GGUF Models in ComfyUI (Low VRAM) - Next Diffusion , but I got:

Any idea?

1 Like

I think it’s just a VRAM shortage. The cause is overuse. You might be able to avoid it in the settings.

(Especially if you’re using Windows,) other programs besides ComfyUI might be consuming VRAM, so keep an eye on those too. Even web browsers can sometimes consume a bit of VRAM as well…


Cause: your GPU ran out of VRAM. The error comes from the sampler step where the UNet runs over the full latent video tensor. In video, memory scales with width Ă— height Ă— frames Ă— batch. A small increase in any of those can OOM an 8 GB card. (GitHub)

Fix it fast on 8 GB (clear steps)

  1. Use the GGUF UNet, not FP16
    In your workflow, swap any “Load Diffusion Model” node for Unet Loader (GGUF) from the ComfyUI-GGUF node. Point it at Wan2.2-TI2V-5B-*.gguf in ComfyUI/models/unet/. If the FP16 .safetensors UNet stays wired in, you will OOM. (GitHub)

  2. Install a quantized 5B build
    Grab the Wan2.2-TI2V-5B-GGUF package. Quantized GGUF variants reduce VRAM at a small quality cost. Place UNet in models/unet, VAE in models/vae, and UMT5 text encoder in models/text_encoders. (Hugging Face)

  3. Start with a tiny workload
    Set the latent node to ~672Ă—384 and 33 frames @ 24 fps, batch = 1 in both the latent/video node and KSampler. Increase later. The official Wan 2.2 page shows where to change length (frames) and confirms the 5B fits on 8 GB with native offloading. (ComfyUI)

  4. Turn on low-VRAM runtime knobs
    In ComfyUI Settings → Server config: set VRAM management mode to auto or lowvram. Consider lowering reserved VRAM if you set it high. These controls exist to prevent OOM. (ComfyUI)

  5. Keep extras off the GPU
    Text encoder and VAE can stay on CPU if VRAM is tight. The official template already uses offloading; use it as your base. (ComfyUI)

  6. About “Lightning” speedups
    LightX2V Wan2.2-Lightning currently documents 4-step distillation for the A14B models. TI2V-5B 4-step support is listed as “Todo,” so don’t expect a working 5B Lightning LoRA yet. (Hugging Face)

  7. If you still OOM
    Reduce width/height first, then frames, then steps. Double-check the UNet loader is the GGUF node, not FP16. The KSampler “Allocation on device” message is a straight VRAM-exceeded signal. (GitHub)

A clean, known-good preset for 8 GB

  • Model: Wan2.2-TI2V-5B-GGUF via Unet Loader (GGUF).
  • Resolution: 672Ă—384.
  • Frames: 33 at 24 fps.
  • Batch: 1.
  • Steps: 12–16 to start.
  • CFG: ~2.0.
  • VRAM mode: auto or lowvram.
  • Template: start from the official Wan2.2 TI2V 5B ComfyUI workflow so native offloading is applied. (ComfyUI)

Why this works (background)

  • Video UNet activations dominate memory. Lowering spatial size or frame count shrinks the latent tensor and the working activations the sampler allocates. That is why the error appears at KSampler. (GitHub)
  • Quantized GGUF UNets trade precision for lower memory, letting the 5B model run on 8 GB when FP16 cannot. The ComfyUI-GGUF node loads these UNets directly. (GitHub)
  • The official ComfyUI Wan 2.2 guide states the 5B workflow fits 8 GB with ComfyUI’s native offloading and shows where to change length and sizes. (ComfyUI)

Quick checks you can do in one minute

  • In the graph, search for “GGUF”. If you don’t see Unet Loader (GGUF), fix that first. (RunComfy)
  • Open Settings → Server config. Confirm VRAM mode is not highvram. (ComfyUI)
  • Open the latent/video node and set length=33. Then run. (ComfyUI)

Short, curated references

Core setup

  • Official Wan 2.2 ComfyUI page. Templates, file paths, and where to change resolution/frames. Notes that 5B fits 8 GB. (ComfyUI)
  • ComfyUI-GGUF custom node. Required to load GGUF UNets. (GitHub)
  • GGUF model cards for Wan 2.2 5B and collection. Shows GGUF availability and placement. (Hugging Face)

Troubleshooting

  • KSampler “Allocation on device … out of memory” discussion. Confirms it is a VRAM exhaustion error. (GitHub)
  • Server config docs. VRAM management modes and what they do. (ComfyUI)

Speed options and limits

  • LightX2V Wan2.2-Lightning. 4-step distillation exists for A14B; TI2V-5B is listed as a future target. (Hugging Face)
1 Like

Hello,

Thank you so much for your reply.

I changed the resolution to 768x768 and the problem was fixed, but the output video is only 2 seconds. If I use the Wan2.2-TI2V-5B-GGUF model, is it possible to produce a video with a longer time?