Hi, I am enabling the flag of use_dora in LoRAConfig. When I disable it, the training time is 16 hours and when I enable it, the training time shows 122hours. I have kept all other configs same. What is causing this behaviour?
Training LLAMA 8b instruct.
Following are my lora and trainingArguments:
lora_config:
target_modules: “q_proj,k_proj,v_proj,o_proj,gate_proj”
r: 32
lora_alpha: 16
lora_dropout: 0.05
use_dora: True
init_lora_weights: “gaussian”
use_rslora: True
freeze_layers: 0
train_params:
learning_rate: 0.00003
per_device_train_batch_size: 1
per_device_eval_batch_size: 4
num_train_epochs: 3
gradient_accumulation_steps: 8
max_grad_norm: 1
eval_strategy: “steps”
eval_steps: 0.123
optim: ‘adamw_8bit’
save_steps: 0.123
weight_decay: 0.01
fp16: true
save_strategy: “steps”
warmup_ratio: 0.1
logging_steps: 50
gradient_checkpointing: false
report_to: ‘tensorboard’
lr_scheduler_type: ‘cosine’
save_total_limit: 100
ddp_find_unused_parameters: false