Finetune on Titan X Pascal

Hello everyone ! I would like to finetune a language model on my personal dataset. But I meet some problems related to machine resources. I have 3 Titan X Pascal, each has 12 GB RAM. I first tried to finetune my model (OPT-1.3B) using fp16, but the loss didn’t even converge. Then I switched to fp32 and during my experiments, I found that the memory was not enough for me to finetune the model with all parameters, which was really strange in theory. I have also tried to use Lora, but the results are not satifactory. So I would like to know:

  1. whether Titan X Pascal supports fp16 for training.
  2. The largest model that I can load on my machine for all-parameter finetune on my machine.
  3. whether the model works better if I increase the number of trainable parameters in Lora?(I set rank=16, and modules= [q_proj, v_proj] now)

Thank you very much