Hello,
I am trying to fine tune sarvamai/OpenHathi-7B-Hi-v0.1-Base model.
While Fine tuining sarvamai/OpenHathi-7B-Hi-v0.1-Base’s quantize model I am Getting
---------------------------------------------------------------------------
OutOfMemoryError Traceback (most recent call last)
<ipython-input-37-3435b262f1ae> in <cell line: 1>()
----> 1 trainer.train()
29 frames
/usr/local/lib/python3.10/dist-packages/transformers/models/llama/modeling_llama.py in forward(self, hidden_states, attention_mask, position_ids, past_key_value, output_attentions, use_cache, **kwargs)
388 value_states = repeat_kv(value_states, self.num_key_value_groups)
389
--> 390 attn_weights = torch.matmul(query_states, key_states.transpose(2, 3)) / math.sqrt(self.head_dim)
391
392 if attn_weights.size() != (bsz, self.num_heads, q_len, kv_seq_len):
OutOfMemoryError: CUDA out of memory. Tried to allocate 406.00 MiB. GPU 0 has a total capacty of 39.56 GiB of which 266.81 MiB is free. Process 64479 has 39.29 GiB memory in use. Of the allocated memory 37.52 GiB is allocated by PyTorch, and 1.28 GiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
Here tha code that I have tried.
# model_id = "sarvamai/OpenHathi-7B-Hi-v0.1-Base"
model_id = "openhathi-gptq-4bit" # quantize model
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id)
configuration = OpenAIGPTConfig.from_pretrained("openhathi-gptq-4bit")
configuration.output_hidden_states = True
training_arguments = TrainingArguments(
# output_dir="/content/drive/MyDrive/CB/LLM/Falcon-7b-MCQ-sample_dataset-model/finetuned_model/SFT_tuning_with_first_two_modules"
output_dir = "/content/drive/MyDrive/",
per_device_train_batch_size=2,
per_device_eval_batch_size=2,
gradient_accumulation_steps=2,
optim="paged_adamw_32bit",
evaluation_strategy='epoch',
num_train_epochs=6,
save_strategy='epoch',
logging_steps=100,
learning_rate=1e-4,
fp16=True,
max_grad_norm=0.3,
group_by_length=True,
warmup_ratio = 0.03,
lr_scheduler_type="constant",
)
from trl import SFTTrainer
max_seq_length = 2048
trainer = SFTTrainer(
model=model,
train_dataset=dataset_train,
eval_dataset=dataset_val,
peft_config=peft_config,
dataset_text_field="text",
max_seq_length=max_seq_length,
tokenizer=tokenizer,
args=training_arguments,
)
model.config.use_cache = False # silence the warnings. Please re-enable for inference!
trainer.train()