Trainer leaked memory?

I am trying to train Llama-7B on a batch size of 1 using deepspeed and huggingface trainer. I have 48GB of memory on my GPU. I am able to train, but after training there appears to be 27GB of residual memory that was not there just before trainer.train() (the memory appears to be set there exactly when trainer.train() is called, based on nvidia-smi calls inside my script).

Deleting the model, deleting the trainer, torch.cuda.empty_cache() all do nothing to remove that memory. How can I edit this memory so that I can continue on in my script?