Hello, Is there a way to find out how much GPU memory an inference is using? For example, giving that I have a text of 256 tokens, when I input it to the LLM, asking it to generate a response with 256 tokens (maximum), how can I know how much GPU memory was used for the task?

Memory overhead/usage calculation

Mdrnfox June 17, 2025, 10:20pm 2

torch.cuda.reset_peak_memory_stats()

start_mem = torch.cuda.memory_allocated()

output = model.generate(input_ids, max_new_tokens=256)

end_mem = torch.cuda.memory_allocated()

peak_mem = torch.cuda.max_memory_allocated()

2 Likes

Topic		Replies	Views
Why is the tensor produced by inference so big? Beginners	2	432	April 17, 2023
Memory increasing after hugging face generate method Models	0	42	November 24, 2024
GPU memory GPTJ inference 🤗Transformers	0	237	June 13, 2023
GPU usage increasing every loop when running inference Beginners	2	1073	May 13, 2024
GPU memory usage is twice (2x) what I calculated based on number of parameters and floating point precision Intermediate	5	457	May 18, 2024