GPU usage increasing every loop when running inference

I have been trying to run inference on llama 2-13b with the following code on colab.

with torch.no_grad():
for context, multi, single, second in zip(all_items, multi_hop_items, single_hop_items, second_hop_items):

inputs_multi = tokenizer(prompt_multi, return_tensors="pt").to("cuda")
generated_ids_multi = model.generate(**inputs_multi, max_length=4096)

outputs_multi = tokenizer.batch_decode(generated_ids_multi, skip_special_tokens=True)

answer_multi = outputs_multi[0]

There is increase in GPU ram in pretty much everyloop. And once say I stop the cell running in the middle, the GPU used stays at that level, leading me to go out of memory very quickly. i have tried using pipeline on dataset, based on the huggingface website and I seem to be having a very similar issue for that as well.