Is there a way to find out how much GPU memory an inference is using?
For example, giving that I have a text of 256 tokens, when I input it to the LLM, asking it to generate a response with 256 tokens (maximum), how can I know how much GPU memory was used for the task?