Some fun findings today!
Here is a comparison of the prices and specs between Azure VMs and AWS SageMaker:
But more importantly, here is a breakdown of all the 64+ GiB RAM models, but sorted by the hourly price per GiB of GPU RAM:
I just thought this was neat, because ND96amsr has 10x as much GPU RAM than NC24s, but is “only” 2.6x as expensive, and so should be cheaper to run on a large dataset in the long run given it should hopefully take significantly less time to train. Also, we were having to keep our training inputs token lengths at about 1,000, because the time needed to train scales quadratically and not linearly, so bumping it up to 2,000 would take almost four times as long, and so this way with a more powerful machine we can hopefully bump the training input lengths from 1,000 to 2,000 tokens long while still staying in our comfort zone of cost and time.
Anyways, just wanted to share because I thought these comparisons were neat! I am really curious to run an experiment as to whether or not the instance with 10x GPU RAM will train 10x as fast, or if it will be slightly less than that because of synchronization.
Let me know if you think there are better options we should consider!