While fine tuning the quantize model of sarvamai/OpenHathi-7B-Hi-v0.1-Base model getting memory error

rhirani · January 29, 2024, 7:19am

Hello,

I am trying to fine tune sarvamai/OpenHathi-7B-Hi-v0.1-Base model.
While Fine tuining sarvamai/OpenHathi-7B-Hi-v0.1-Base’s quantize model I am Getting

---------------------------------------------------------------------------
OutOfMemoryError                          Traceback (most recent call last)
<ipython-input-37-3435b262f1ae> in <cell line: 1>()
----> 1 trainer.train()

29 frames
/usr/local/lib/python3.10/dist-packages/transformers/models/llama/modeling_llama.py in forward(self, hidden_states, attention_mask, position_ids, past_key_value, output_attentions, use_cache, **kwargs)
    388         value_states = repeat_kv(value_states, self.num_key_value_groups)
    389 
--> 390         attn_weights = torch.matmul(query_states, key_states.transpose(2, 3)) / math.sqrt(self.head_dim)
    391 
    392         if attn_weights.size() != (bsz, self.num_heads, q_len, kv_seq_len):

OutOfMemoryError: CUDA out of memory. Tried to allocate 406.00 MiB. GPU 0 has a total capacty of 39.56 GiB of which 266.81 MiB is free. Process 64479 has 39.29 GiB memory in use. Of the allocated memory 37.52 GiB is allocated by PyTorch, and 1.28 GiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

Here tha code that I have tried.

# model_id = "sarvamai/OpenHathi-7B-Hi-v0.1-Base"

model_id = "openhathi-gptq-4bit" # quantize model

tokenizer = AutoTokenizer.from_pretrained(model_id)

model = AutoModelForCausalLM.from_pretrained(model_id)

configuration = OpenAIGPTConfig.from_pretrained("openhathi-gptq-4bit")

configuration.output_hidden_states = True

training_arguments = TrainingArguments(
    # output_dir="/content/drive/MyDrive/CB/LLM/Falcon-7b-MCQ-sample_dataset-model/finetuned_model/SFT_tuning_with_first_two_modules"
    output_dir = "/content/drive/MyDrive/",
    per_device_train_batch_size=2,
    per_device_eval_batch_size=2,
    gradient_accumulation_steps=2,
    optim="paged_adamw_32bit",
    evaluation_strategy='epoch',
    num_train_epochs=6,
    save_strategy='epoch',
    logging_steps=100,
    learning_rate=1e-4,
    fp16=True,
    max_grad_norm=0.3,
    group_by_length=True,
    warmup_ratio = 0.03,
    lr_scheduler_type="constant",
)

from trl import SFTTrainer
max_seq_length = 2048

trainer = SFTTrainer(
    model=model,
    train_dataset=dataset_train,
    eval_dataset=dataset_val,
    peft_config=peft_config,
    dataset_text_field="text",
    max_seq_length=max_seq_length,
    tokenizer=tokenizer,
    args=training_arguments,

)
model.config.use_cache = False  # silence the warnings. Please re-enable for inference!
trainer.train()

Topic		Replies	Views
CUDA Out of Memory Error SFTTrainer 🤗Transformers	1	134	February 16, 2025
Training out of memory 🤗Transformers	0	224	July 18, 2024
Fine tune Meta-Llama-3.1-8B OOM error after the 1st training step Models	0	164	September 6, 2024
CUDA Out of Memory while fine-tuning even with LoRA Models	6	3270	April 12, 2024
CUDA OUT OF MEMORY on MULTI GPU 🤗Transformers	0	715	February 28, 2024

While fine tuning the quantize model of sarvamai/OpenHathi-7B-Hi-v0.1-Base model getting memory error

Related topics