From the blogpost, Spread Your Wings: Falcon 180B is here there’s a breakdown on instance needed and memory needed for full fine-tuning.
Is there some guide on how to do that in Sagemaker?
The questions in parts:
-
Does running a simple
train_clm.py
script work for full fine-tuning?- There’s an example like https://nbviewer.org/github/huggingface/notebooks/blob/main/sagemaker/28_train_llms_with_qlora/scripts/run_clm.py but it is working on 1 instance and using qlora for fine-tuning
- If not, are there’s examples of doing distributed full fine-tuning? Would this work? Jupyter Notebook Viewer
-
What configurations are available when using the “distribution” argument for full fine-tuning?
- This won’t work easily for 180B, right?
"smdistributed": {"dataparallel": {"enabled": True}}
- This won’t work easily for 180B, right?
-
“8 x 8 x A100” would be 8 counts of
ml.p4d.24xlarge
on Sagemaker, is that correct? -
7,000,000 GPU hours on “8 x 8 x A100”, would that equate to
- 7,000,000 / 8 instance counts / 8 GPUs = 109,375 hours on “8 x 8 x A100”
- 109,375 / 24 hours a day / 365 days a year ~= 12 years on “8 x 8 x A100”
- so, to do full fine-tune as much as the training day on “8 x 8 x A100” would take 12 years?
- and to acheive the same amount of training in lets say 1 year, we have to do 96 instance counts of
ml.p4d.24xlarge
?- and if we take $19.22 per hour on the instance, with 96 instance for a full year, the sum cost to train the model would be around $19.22 per instance per hour * 24 hours a day * 365 days * 96 instances ~= US$16 million (if we’re budgeting for a full fine-tuning a similar model, would $16M be an appropriate number?)
Thank you in advance for the information! Look forward to anyone with more information on how to do full fine-tuning on the 180B model.