FineTuning 7B model on 3080 laptop (16GO VRAM) issues

luxNychta · May 16, 2025, 2:34am

Hi,

I’m trying to fine tune 7B model (deepseek for now) on my own data. I tried to follow a tutorial, I added a step to filter data > 8192 tokens (to save some VRAM) and then I have this code:

from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
import torch
from peft import LoraConfig, get_peft_model
from datasets import load_dataset
import os
from transformers import TrainingArguments, Trainer, DataCollatorForLanguageModeling


MODEL_ID   = "deepseek-ai/deepseek-coder-6.7b-instruct"   # or "Qwen/CodeQwen1.5-7B"
DATA_FILE  = "train_data.jsonl"
OUTPUT_DIR = "./fine_tuned_model"
MAX_SEQ_LEN = 8192

bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_compute_dtype=torch.float16,
    bnb_4bit_use_double_quant=True,
    bnb_4bit_quant_type="nf4",
)
tokenizer = AutoTokenizer.from_pretrained(MODEL_ID)
model = AutoModelForCausalLM.from_pretrained(
    MODEL_ID,
    quantization_config=bnb_config,
    device_map="auto"
)

lora_config = LoraConfig(
    r=8,
    lora_alpha=16,
    lora_dropout=0.1,
    target_modules=["q_proj", "k_proj", "v_proj", "o_proj", "gate_proj", "up_proj", "down_proj"]
)
model = get_peft_model(model, lora_config)
model.print_trainable_parameters()

dataset = load_dataset("json", data_files=DATA_FILE)["train"]
def is_short_enough(example):
    ids = tokenizer(
        tokenizer.apply_chat_template(example["messages"],
                                      tokenize=False,
                                      add_generation_prompt=False),
        add_special_tokens=False,
    )["input_ids"]
    return len(ids) <= MAX_SEQ_LEN

dataset = dataset.filter(is_short_enough, num_proc=os.cpu_count())

tokenizer.pad_token = tokenizer.eos_token

def tokenize_function(example):
    prompt = tokenizer.apply_chat_template(
        example["messages"],
        tokenize=False,
        add_generation_prompt=False  
    )
    return tokenizer(
        prompt,
        truncation=True,
        padding="max_length",
        max_length=MAX_SEQ_LEN,
        return_tensors=None
    )


tokenized_data = dataset.map(tokenize_function, num_proc=os.cpu_count(), remove_columns=dataset.column_names)

training_args = TrainingArguments(
    output_dir=OUTPUT_DIR,
    per_device_train_batch_size=1,
    gradient_accumulation_steps=4,
    num_train_epochs=15,
    learning_rate=1e-4,
    fp16=True,
    logging_steps=10,
    save_total_limit=1
)

data_collator = DataCollatorForLanguageModeling(tokenizer, mlm=False)
trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=tokenized_data,
    data_collator=data_collator
)
model.config.use_cache = False
trainer.train()

os.makedirs(OUTPUT_DIR, exist_ok=True)
model.save_pretrained(OUTPUT_DIR)
tokenizer.save_pretrained(OUTPUT_DIR)
print(f"Model and tokenizer saved at {OUTPUT_DIR}")

My data follow this pattern:

{
  "messages": [
    {"role": "system", "content": "..."},
    {"role": "user", "content": "..."},
    {"role": "assistant", "content": "{\"response\": \"...\"}"}
  ]
}

I took a tutorial with QLoRA/PEFT because if I got it well, it allows me to use less VRAM and as you could see, I have only 16GO. So my question is: is there a way to achieve what I want ? And if it’s the case what I’m doing wrong ? (It’s the first model I try to finetune)

Also this are versions for my libraries:

pip list | grep -E 'torch|transformers|accelerate|trl|datasets|bitsandbytes|peft|sentencepiece'
accelerate                           1.7.0
bitsandbytes                         0.45.5
datasets                             3.6.0
fastrlock                            0.8.3
peft                                 0.15.2
sentencepiece                        0.2.0
torch                                2.1.2+cu121
torchaudio                           2.1.2
torchvision                          0.16.2
transformers                         4.51.3
trl                                  0.8.6

For now I have this OOM error when I run the code:

vllm_venv/lib/python3.10/site-packages/transformers/utils/hub.py:105: FutureWarning: Using `TRANSFORMERS_CACHE` is deprecated and will be removed in v5 of Transformers. Use `HF_HOME` instead.
  warnings.warn(
Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:11<00:00,  5.78s/it]
trainable params: 19,988,480 || all params: 6,760,501,248 || trainable%: 0.2957
No label_names provided for model class `PeftModel`. Since `PeftModel` hides base models input arguments, if label_names is not given, label_names can't be set automatically within `Trainer`. Note that empty label_names list will be used instead.

.......
[Traceback, no need I copy I think]
........

torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 172.00 MiB. GPU 0 has a total capacty of 15.74 GiB of which 73.38 MiB is free. Including non-PyTorch memory, this process has 15.66 GiB memory in use. Of the allocated memory 15.01 GiB is allocated by PyTorch, and 465.39 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
  0%|          | 0/106605 [00:01<?, ?it/s]

John6666 · May 16, 2025, 11:01am

It might be better to specify a smaller value for per_device_train_batch_size. I think the default was 8.
https://stackoverflow.com/questions/76921201/cuda-out-of-memory-using-trainer-in-huggingface-during-validation-training-is-f

Topic		Replies	Views
After fine tuning, saving and reloading the model, he is "forgetting" fine tuning 🤗Transformers	0	801	August 9, 2023
Retraining peft model Intermediate	3	2926	March 1, 2024
Finetuning llama-2 for classification 🤗Transformers	2	1929	January 29, 2024
Fine tune and then successfully AWQ quantize Beginners	3	2914	February 16, 2024
Should 24GB of VRAM be able to fine tune a 1B model? Beginners	9	651	February 23, 2025

FineTuning 7B model on 3080 laptop (16GO VRAM) issues

Related topics