Hi !
I am trying to do a simple ORPO fine tunning using a very small dataset: “celsowm/auryn_dpo_orpo”
The problem is, when I try to do this after training:
model = AutoModelForCausalLM.from_pretrained(
base_model,
low_cpu_mem_usage=True,
return_dict = True,
torch_dtype=torch.float16,
device_map="auto"
)
model, tokenizer = setup_chat_format(model, tokenizer)
model = PeftModel.from_pretrained(model, new_model)
model = model.merge_and_unload()
I got OOM !
The complete kaggle is here: https://www.kaggle.com/code/celsofontes/fine-tunning-orpo
Any hints?
this worked for me
model = AutoModelForCausalLM.from_pretrained(
base_model,
low_cpu_mem_usage=True,
return_dict = True,
torch_dtype=torch.float16,
device_map=“auto”,
)
gc.collect()
model, tokenizer = setup_chat_format(model, tokenizer)
gc.collect()
model = PeftModel.from_pretrained(model, new_model)
gc.collect()
model = model.merge_and_unload()
gc.collect()
unfortunately still not working:
CUDA out of memory. Tried to allocate 1.96 GiB. GPU 0 has a total capacty of 15.89 GiB of which 670.12 MiB is free. Process 2065 has 15.22 GiB memory in use. Of the allocated memory 14.80 GiB is allocated by PyTorch, and 120.73 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
I even tried this new param “offload_buffers”:
model = AutoModelForCausalLM.from_pretrained(
base_model,
low_cpu_mem_usage=True,
return_dict = True,
torch_dtype=torch.float16,
device_map="auto",
offload_buffers=True
)
But when I tried to merge, OOM again
I’ve discovered the way:
instead:
device_map="auto"
using:
device_map="cpu"
system
Closed
May 27, 2024, 10:38am
5
This topic was automatically closed 12 hours after the last reply. New replies are no longer allowed.