I’m running the StableVideoDiffusionPipeline demo with Stable Video Diffusion as reference. In the doc it claims with all the low-memory tech the VRAM cosumption can be lower to 8 GB. Yet, applying the techs, I still got CUDA out of memory error with an abnormal 39.55 GB requirement
Traceback (most recent call last):
File "/mnt/workspace/lcm/test4_img2video.py", line 21, in <module>
frames = pipe(image, decode_chunk_size=1, generator=generator).frames[0]
File "/mnt/workspace/conda_venvs/torch/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
return func(*args, **kwargs)
File "/mnt/workspace/conda_venvs/torch/lib/python3.10/site-packages/diffusers/pipelines/stable_video_diffusion/pipeline_stable_video_diffusion.py", line 499, in __call__
noise_pred = self.unet(
File "/mnt/workspace/conda_venvs/torch/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/mnt/workspace/conda_venvs/torch/lib/python3.10/site-packages/accelerate/hooks.py", line 165, in new_forward
output = module._old_forward(*args, **kwargs)
File "/mnt/workspace/conda_venvs/torch/lib/python3.10/site-packages/diffusers/models/unet_spatio_temporal_condition.py", line 434, in forward
sample, res_samples = downsample_block(
File "/mnt/workspace/conda_venvs/torch/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/mnt/workspace/conda_venvs/torch/lib/python3.10/site-packages/diffusers/models/unet_3d_blocks.py", line 2173, in forward
hidden_states = attn(
File "/mnt/workspace/conda_venvs/torch/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/mnt/workspace/conda_venvs/torch/lib/python3.10/site-packages/diffusers/models/transformer_temporal.py", line 351, in forward
hidden_states = block(
File "/mnt/workspace/conda_venvs/torch/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/mnt/workspace/conda_venvs/torch/lib/python3.10/site-packages/diffusers/models/attention.py", line 288, in forward
attn_output = self.attn1(
File "/mnt/workspace/conda_venvs/torch/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/mnt/workspace/conda_venvs/torch/lib/python3.10/site-packages/diffusers/models/attention_processor.py", line 522, in forward
return self.processor(
File "/mnt/workspace/conda_venvs/torch/lib/python3.10/site-packages/diffusers/models/attention_processor.py", line 743, in __call__
attention_probs = attn.get_attention_scores(query, key, attention_mask)
File "/mnt/workspace/conda_venvs/torch/lib/python3.10/site-packages/diffusers/models/attention_processor.py", line 590, in get_attention_scores
baddbmm_input = torch.empty(
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 39.55 GiB (GPU 0; 31.75 GiB total capacity; 5.01 GiB already allocated; 25.05 GiB free; 5.50 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
And adding/removing any memory-saving techs does not change the 39.55 GB requirement amount, which seems the tech just have no effect at all.
My code:
import torch
from diffusers import StableVideoDiffusionPipeline
from diffusers.utils import load_image, export_to_video
pipe = StableVideoDiffusionPipeline.from_pretrained(
"/mnt/aigc/sd_weights/modelscope/hub/AI-ModelScope/stable-video-diffusion-img2vid-xt",
torch_dtype=torch.float16, variant="fp16"
)
pipe.enable_model_cpu_offload()
pipe.unet.enable_forward_chunking()
# Load the conditioning image
image = load_image("./test_data/rocket.png")
# image = image.resize((1024, 576))
image = image.resize((512, 288))
generator = torch.manual_seed(42)
# frames = pipe(image, decode_chunk_size=8, generator=generator).frames[0]
frames = pipe(image, decode_chunk_size=1, generator=generator).frames[0]
export_to_video(frames, "generated.mp4", fps=7)
any help is appreciated