Hi all.
I am not getting a substantial training time improvement with LORA
Details :
My dataset size is ~2M training samples
- When I am fine-tuning roberta-base model ( base model has ~125M trainable parameters ) training time is roughly 30.5hrs
- When I am fine-tuning roberta-base with LORA ( r=8, lora_alpha=8, ~2M trainable parameters ) training time is roughly 29hrs
- Batch size, number of GPUs, number of epochs is same for both 1 and 2
Are we guaranteed to observe a substantial improvement in training time when using LORA on large datasets ?
Can anyone please guide regarding this