Good Analysis, but generally you need to monitor max_cuda_allocation to know the max memory choke point in inference call, that will know usage of VRAM,
1 Like
Good Analysis, but generally you need to monitor max_cuda_allocation to know the max memory choke point in inference call, that will know usage of VRAM,