How does one do full fine-tuning on Falcon 180B?

From the blogpost, Spread Your Wings: Falcon 180B is here there’s a breakdown on instance needed and memory needed for full fine-tuning.

Is there some guide on how to do that in Sagemaker?

The questions in parts:

  • Does running a simple script work for full fine-tuning?

  • What configurations are available when using the “distribution” argument for full fine-tuning?

    • This won’t work easily for 180B, right? "smdistributed": {"dataparallel": {"enabled": True}}
  • “8 x 8 x A100” would be 8 counts of ml.p4d.24xlarge on Sagemaker, is that correct?

  • 7,000,000 GPU hours on “8 x 8 x A100”, would that equate to

    • 7,000,000 / 8 instance counts / 8 GPUs = 109,375 hours on “8 x 8 x A100”
    • 109,375 / 24 hours a day / 365 days a year ~= 12 years on “8 x 8 x A100”
    • so, to do full fine-tune as much as the training day on “8 x 8 x A100” would take 12 years?
    • and to acheive the same amount of training in lets say 1 year, we have to do 96 instance counts of ml.p4d.24xlarge?
      • and if we take $19.22 per hour on the instance, with 96 instance for a full year, the sum cost to train the model would be around $19.22 per instance per hour * 24 hours a day * 365 days * 96 instances ~= US$16 million (if we’re budgeting for a full fine-tuning a similar model, would $16M be an appropriate number?)

Thank you in advance for the information! Look forward to anyone with more information on how to do full fine-tuning on the 180B model.