Abnormally high VRAM required when using StableVideoDiffusionPipeline

I’m running the StableVideoDiffusionPipeline demo with Stable Video Diffusion as reference. In the doc it claims with all the low-memory tech the VRAM cosumption can be lower to 8 GB. Yet, applying the techs, I still got CUDA out of memory error with an abnormal 39.55 GB requirement

Traceback (most recent call last):
  File "/mnt/workspace/lcm/test4_img2video.py", line 21, in <module>
    frames = pipe(image, decode_chunk_size=1, generator=generator).frames[0]
  File "/mnt/workspace/conda_venvs/torch/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
    return func(*args, **kwargs)
  File "/mnt/workspace/conda_venvs/torch/lib/python3.10/site-packages/diffusers/pipelines/stable_video_diffusion/pipeline_stable_video_diffusion.py", line 499, in __call__
    noise_pred = self.unet(
  File "/mnt/workspace/conda_venvs/torch/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/mnt/workspace/conda_venvs/torch/lib/python3.10/site-packages/accelerate/hooks.py", line 165, in new_forward
    output = module._old_forward(*args, **kwargs)
  File "/mnt/workspace/conda_venvs/torch/lib/python3.10/site-packages/diffusers/models/unet_spatio_temporal_condition.py", line 434, in forward
    sample, res_samples = downsample_block(
  File "/mnt/workspace/conda_venvs/torch/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/mnt/workspace/conda_venvs/torch/lib/python3.10/site-packages/diffusers/models/unet_3d_blocks.py", line 2173, in forward
    hidden_states = attn(
  File "/mnt/workspace/conda_venvs/torch/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/mnt/workspace/conda_venvs/torch/lib/python3.10/site-packages/diffusers/models/transformer_temporal.py", line 351, in forward
    hidden_states = block(
  File "/mnt/workspace/conda_venvs/torch/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/mnt/workspace/conda_venvs/torch/lib/python3.10/site-packages/diffusers/models/attention.py", line 288, in forward
    attn_output = self.attn1(
  File "/mnt/workspace/conda_venvs/torch/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/mnt/workspace/conda_venvs/torch/lib/python3.10/site-packages/diffusers/models/attention_processor.py", line 522, in forward
    return self.processor(
  File "/mnt/workspace/conda_venvs/torch/lib/python3.10/site-packages/diffusers/models/attention_processor.py", line 743, in __call__
    attention_probs = attn.get_attention_scores(query, key, attention_mask)
  File "/mnt/workspace/conda_venvs/torch/lib/python3.10/site-packages/diffusers/models/attention_processor.py", line 590, in get_attention_scores
    baddbmm_input = torch.empty(
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 39.55 GiB (GPU 0; 31.75 GiB total capacity; 5.01 GiB already allocated; 25.05 GiB free; 5.50 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

And adding/removing any memory-saving techs does not change the 39.55 GB requirement amount, which seems the tech just have no effect at all.
My code:

import torch

from diffusers import StableVideoDiffusionPipeline
from diffusers.utils import load_image, export_to_video

pipe = StableVideoDiffusionPipeline.from_pretrained(
    "/mnt/aigc/sd_weights/modelscope/hub/AI-ModelScope/stable-video-diffusion-img2vid-xt",
    torch_dtype=torch.float16, variant="fp16"
)
pipe.enable_model_cpu_offload()
pipe.unet.enable_forward_chunking()

# Load the conditioning image
image = load_image("./test_data/rocket.png")
# image = image.resize((1024, 576))
image = image.resize((512, 288))

generator = torch.manual_seed(42)
# frames = pipe(image, decode_chunk_size=8, generator=generator).frames[0]
frames = pipe(image, decode_chunk_size=1, generator=generator).frames[0]

export_to_video(frames, "generated.mp4", fps=7)

any help is appreciated

got issue fixed by using pytorch 2.x

1 Like

This topic was automatically closed 12 hours after the last reply. New replies are no longer allowed.