Estimate training compute for 150B LLM

Hi everyone,

I’m trying to estimate GPU memory usage needed to train an 150B language model.
This seems complex because of various factors.
What should I notice? Is there any formula for this?