FREQUENT LOSS SPIKING in CONTINUE TRAINING LLM

hunggggg · September 15, 2023, 2:41am

Hi everyone, I am trying to continue training Llama-2-hf both chat and non chat version with a custom dataset crawled on the internet using Transformer together with PEFT and QLORA. However with different configs, I experienced a very weird loss curve as such:

Chat version: .
Non-chat version: Similar pattern but can’t add it due to being a new member.

The configs that I use in each version:

Chat:

train_name: baseline
model_source: NousResearch
model_name: Llama-2-7b-chat-hf
bnb_cfg:
  load_in_4bit: True
  bnb_4bit_use_double_quant: True
  bnb_4bit_quant_type: nf4
  bnb_4bit_compute_dtype: torch.bfloat16
lora_cfg:
  peft_type: null
  auto_mapping: null
  base_model_name_or_path: null
  revision: null
  task_type: CAUSAL_LM
  inference_mode: False
  r: 64
  target_modules: null
  lora_alpha: 32
  lora_dropout: 0.05
  fan_in_fan_out: False
  bias: none
  modules_to_save: null
  init_lora_weights: True
  layers_to_transform: null
  layers_pattern: null
train_cfg:
  per_device_train_batch_size: 8
  gradient_accumulation_steps: 16
  warmup_ratio: 0.03
  max_steps: -1
  learning_rate: 8.e-5
  weight_decay: 0.0001
  fp16: True
  logging_steps: 1
  num_train_epochs: 5
  optim: paged_adamw_32bit
  evaluation_strategy: steps
  lr_scheduler_type :  constant
  do_train: True
  do_eval: True
  eval_steps: 200
  save_strategy: steps
  save_steps: 100
  group_by_length: True
  dataloader_num_workers: 0
  dataloader_drop_last: True
  ddp_find_unused_parameters: False
max_seq_length: 512

Non-chat:

train_name: baseline
model_source: NousResearch
model_name: Llama-2-7b-hf
bnb_cfg:
  load_in_4bit: True
  bnb_4bit_use_double_quant: True
  bnb_4bit_quant_type: nf4
  bnb_4bit_compute_dtype: torch.bfloat16
lora_cfg:
  peft_type: null
  auto_mapping: null
  base_model_name_or_path: null
  revision: null
  task_type: CAUSAL_LM
  inference_mode: False
  r: 16  #experience with rank = 8 but similar loss pattern
  target_modules: null
  lora_alpha: 32
  lora_dropout: 0.05
  fan_in_fan_out: False
  bias: none
  modules_to_save: null
  init_lora_weights: True
  layers_to_transform: null
  layers_pattern: null
train_cfg:
  per_device_train_batch_size: 8
  gradient_accumulation_steps: 16
  warmup_ratio: 0.03
  max_steps: -1
  learning_rate: 8.e-5
  weight_decay: 0.0001
  fp16: True
  logging_steps: 1
  num_train_epochs: 5
  optim: paged_adamw_32bit
  evaluation_strategy: steps
  lr_scheduler_type :  constant
  do_train: True
  do_eval: True
  eval_steps: 200
  save_strategy: steps
  save_steps: 100
  group_by_length: True
  dataloader_num_workers: 0
  dataloader_drop_last: True
  ddp_find_unused_parameters: False
max_seq_length: 512

Can anyone explain why this is happening and provide me with any suggestion to improve the continue training process? Thank everyone a lot! I am looking forward to your responses!

cvamsi815 · April 3, 2024, 11:14pm

@hunggggg were you able to solve this? I am facing similar issue, can you please help

swtb · April 4, 2024, 9:33am

It seems odd to me that you arent targeting any modules with LoRa

Topic		Replies	Views
LLM training loss fluctuation 🤗Transformers	0	949	November 30, 2023
Fine-tuning LLM for regression yields low loss during training but not in inference? 🤗Transformers	2	4489	March 4, 2024
Training llama2-7b-chat, is my model overfitting? i think my model is not learning anything? how to better train? Beginners	3	609	April 23, 2024
Getting No log in validation_loss 🤗Transformers	3	209	February 14, 2025
What is the loss Function when fine-tuning LlamaV2 Models	0	2141	September 19, 2023

FREQUENT LOSS SPIKING in CONTINUE TRAINING LLM

Related topics