@shamanez just wanted to know, I am using 4 GPUS with 24gb of memory each on AWS SageMaker. Still for batch size - 2 (Keeping batch size very low) after number of iterations it goes CUDA - OUT OF MEMORY! Can you suggest me how can I resolve this?
Also, after every training epoch it checks the entire Validation set so the number of iterations gets multiplied ! Any hints to solve this as well?