I’m using HuggingFace’s Accelerate for offloading the parameters to SSD.
But when doing it, it always occupies the CPU thread blocking other commands from being conducted in parallel.
And I found that there’s a Nvidia GPUDirectStorage that uses NVMe’s DMA enabling the CPU to do other commands in parallel.
Does Accelerate already include this kind of function?
If not can I use it by customizing Accelerate source code?