Hi everyone,
I’m trying to estimate GPU memory usage needed to train an 150B language model.
This seems complex because of various factors.
What should I notice? Is there any formula for this?
Hi everyone,
I’m trying to estimate GPU memory usage needed to train an 150B language model.
This seems complex because of various factors.
What should I notice? Is there any formula for this?