I have been trying to run inference on llama 2-13b with the following code on colab. with torch.no_grad(): for context, multi, single, second in zip(all_items, multi_hop_items, single_hop_items, second_hop_items): inputs_multi = tokenizer(prompt_multi, return_tensors="pt").to("cuda") generated_id…

GPU usage increasing every loop when running inference

bxrjmfh May 13, 2024, 3:59am 3

same problem

Topic		Replies	Views
Memory increasing after hugging face generate method Models	0	42	November 24, 2024
How can I batch LLaVa inference, so that I can use all of my GPU memory? Beginners	0	1283	January 8, 2024
Accelerating inference for local HuggingFacePipeline of Llama3 🤗Transformers	0	90	August 1, 2024
Why is the tensor produced by inference so big? Beginners	2	432	April 17, 2023
Memory overhead/usage calculation Intermediate	3	54	June 20, 2025