vRAM usage during my pipeline run looks like this:
It uses a bunch of memory when I load the pipeline with from_pretrained
, which is expected. Then more memory when I call the pipeline. Then it holds steady for a while as the pipeline is iterating through its steps, before making one smaller jump later.
I assume that last one is when the VAE is invoked to decode the output.
I didn’t number this chart, but the total range of the Y-axis is 8 GB, which makes that last jump something like half a gig.
While that seems like a lot for a measly 0.75 MB worth of pixel data, it’s not so much the amount I’m concerned with. It’s modest in comparison to the overall needs of the pipeline. My question is why the allocator grabs more memory at that time.
The diffusion model is done. Shouldn’t it be able to reclaim more than enough memory from that?
I guess the answer is that the allocator is grabbing more memory because it can — there’s no memory pressure yet. Throwing a torch.cuda.empty_cache()
in there before that stage seems to confirm this: then memory goes down a tad at that point, not up; what’s more is that it stays there for the next pipeline run.
So maybe I’ve answered my own question, but I’m still confused about that big chunk of memory that is not allocated during pipeline init, but seems to be on our first forward
call, and I’m not sure if or when it is reclaimed.