Memory explosion while using Diffusers pipeline

I am running SD-XL 1.0 inference using the following code:


import torch
import matplotlib.pyplot as plt

from PIL import Image
from diffusers import StableDiffusionXLPipeline

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

pipe = StableDiffusionXLPipeline.from_pretrained("stabilityai/stable-diffusion-xl-base-1.0",
                                                 torch_dtype=torch.float16,
                                                 variant="fp16",
                                                 output_type="latent",
                                                 use_safetensors=True)

pipe = pipe.to(device)

prompt = "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k"
output = pipe(prompt=prompt, num_inference_steps=10)

Around 8GB of memory is occupied until the final timestep of the image generation and suddenly at the end of the image generation the memory doubles up and 15GB is occupied. If I clear the cache using torch.cuda.empty_cache() immediately the memory is freed up and I get back only 8GB of memory. I would like to understand why this happens and how I can avoid it

I am tracking the memory using nvidia-smi command