Hyperparameter tuning practical guide?

Hi i have been having problems doing parameter tuning with google colab, where its alawys gpu that runs out of memory.

Is there any practical advice you could give me for tuning bert models? In terms of envoirment settings i need for example number of gpu so i don’t run out of mem

It is to be noted that when doing tuning with CPU it works but takes ages.

I am using trainer api with Optuna

If your GPU can only take 16 as batch_size then make sure that multiplication of batch_size and gradient_accumulation does not go beyond 16. You need to specify range for both these parameters such that any combination of elements from both ranges does not take the effective batch_size beyond 16.