I have a Geforce gt1030, which has 2GB of VRAM and ~350 CUDA cores. I’ve heard that people have had success running some models on these cards, but I’m wondering what happens if I try to run larger models, say a 13B parameter model, that would require more VRAM than I have. Does the CUDA part just get skipped (or the code doesn’t run)? Does torch break the execution into pieces and runs them on the card? If it’s something like the latter, would you imagine I still see the speedup of using CUDA compared to CPU?