Deepspeed mii library issues

i tried the Deepspeed mii library for create a pipeline with jupiter cuda compatibility score 8.0+ but it’s give me error :

OutOfMemoryError: CUDA out of memory. Tried to allocate 19.75 GiB. GPU 0 has a total capacity of 22.03 GiB of which 19.51 GiB is free. Including non-PyTorch memory, this process has 2.52 GiB memory in use. Of the allocated memory 1.35 GiB is allocated by PyTorch, and 5.34 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting
PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (CUDA semantics — PyTorch 2.4 documentation)

if know any one how to solve this error please help me.

1 Like

There are ways to kill the process frequently or manually manipulate only a part of the model without relying on libraries, but the easiest way seems to be to reduce the image size or data set size.

1 Like