Larger instance types to do not reduce training time?

philschmid · February 7, 2022, 7:50am

It makes sense that your training is not faster when using g4dn.2xlarge or g4dn.4xlarge since they only also have 1 GPU.

But when using p3.2xlarge there should be some difference. What are you seeing when taking a look at you GPUUtilization and GPUMemoryUtilization at your training job overview?

Topic		Replies	Views
Accelerate not performing distributed training 🤗Accelerate	2	575	October 5, 2023
Recommend an instance for MPT-7B and MPT-30B inference Amazon SageMaker	2	406	July 19, 2023
Sagemaker gpt-j train file error Amazon SageMaker	27	2913	February 22, 2024
SFTTrainer training very slow. Is this training speed expected? Beginners	4	300	February 8, 2025
Training using multiple GPUs Beginners	20	20160	February 25, 2024

Larger instance types to do not reduce training time?

Related topics