SAM2 video streaming – VRAM usage keeps increasing until OOM

Hi,

I’m using the SAM2 model in video streaming ( SAM2 Video ). With each processed frame, GPU memory usage (torch.cuda.memory_allocated()) increases steadily until it eventually runs out of memory (CUDA OOM).

I’ve tried:

  • Setting max_vision_features_cache_size=1

  • Calling reset_tracking_data() and reset_inference_session() periodically

  • Deleting all local tensors after each frame and running gc.collect() + torch.cuda.empty_cache()

  • Loading frames one-by-one from disk (no large RAM usage)

Despite this, allocated memory grows linearly with every frame, suggesting that something in the SAM2 streaming pipeline is keeping GPU tensors alive for all processed frames.

Has anyone else experienced this? Is there a known workaround to keep VRAM usage stable during long streaming inference without reloading the model each time?

— System Info —
Platform: Linux-6.16.3-76061603-generic-x86_64-with-glibc2.35
Python: 3.11.13 | packaged by conda-forge | (main, Jun 4 2025, 14:48:23) [GCC 13.3.0]

— PyTorch and CUDA Info —
PyTorch Version: 2.7.1+cu128
Is CUDA available: True
CUDA Version: 12.8
cuDNN Version: 90701
GPU Name: NVIDIA GeForce RTX 5060 Ti

— Transformers Info —
Transformers Version: 4.57.0.dev0

1 Like

It seems to be a known issue specific to SAM2

1 Like

Hey, thanks a ton for such a detailed reply! Really appreciate you breaking down the possible causes and sharing concrete steps to try. I’ll give your suggestions a go and report back once I’ve tested them out. :raising_hands:

Thanks again for the help! :raising_hands:

1 Like