Cost-effective Cloud Environments for Training

capnchat · December 28, 2023, 9:50pm

Some fun findings today!

Here is a comparison of the prices and specs between Azure VMs and AWS SageMaker:

But more importantly, here is a breakdown of all the 64+ GiB RAM models, but sorted by the hourly price per GiB of GPU RAM:

I just thought this was neat, because ND96amsr has 10x as much GPU RAM than NC24s, but is “only” 2.6x as expensive, and so should be cheaper to run on a large dataset in the long run given it should hopefully take significantly less time to train. Also, we were having to keep our training inputs token lengths at about 1,000, because the time needed to train scales quadratically and not linearly, so bumping it up to 2,000 would take almost four times as long, and so this way with a more powerful machine we can hopefully bump the training input lengths from 1,000 to 2,000 tokens long while still staying in our comfort zone of cost and time.

Anyways, just wanted to share because I thought these comparisons were neat! I am really curious to run an experiment as to whether or not the instance with 10x GPU RAM will train 10x as fast, or if it will be slightly less than that because of synchronization.

Let me know if you think there are better options we should consider!

Topic		Replies	Views
What hardware do you use to train your models? Cloud or local? Intermediate	0	772	October 31, 2022
Seeking Advice on Amazon Bedrock and Azure Beginners	2	53	January 22, 2025
BUYING ADVICE for local LLM machine Beginners	9	887	March 26, 2025
New: Distributed GPU Platform Research	2	655	November 8, 2023
Azure Machine Learning: Training isn't significant faster with multiple GPU's Beginners	1	545	February 2, 2023

Cost-effective Cloud Environments for Training

Related topics