Hi,
Iβm trying to fine tune 7B model (deepseek for now) on my own data. I tried to follow a tutorial, I added a step to filter data > 8192 tokens (to save some VRAM) and then I have this code:
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
import torch
from peft import LoraConfig, get_peft_model
from datasets import load_dataset
import os
from transformers import TrainingArguments, Trainer, DataCollatorForLanguageModeling
MODEL_ID = "deepseek-ai/deepseek-coder-6.7b-instruct" # or "Qwen/CodeQwen1.5-7B"
DATA_FILE = "train_data.jsonl"
OUTPUT_DIR = "./fine_tuned_model"
MAX_SEQ_LEN = 8192
bnb_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_compute_dtype=torch.float16,
bnb_4bit_use_double_quant=True,
bnb_4bit_quant_type="nf4",
)
tokenizer = AutoTokenizer.from_pretrained(MODEL_ID)
model = AutoModelForCausalLM.from_pretrained(
MODEL_ID,
quantization_config=bnb_config,
device_map="auto"
)
lora_config = LoraConfig(
r=8,
lora_alpha=16,
lora_dropout=0.1,
target_modules=["q_proj", "k_proj", "v_proj", "o_proj", "gate_proj", "up_proj", "down_proj"]
)
model = get_peft_model(model, lora_config)
model.print_trainable_parameters()
dataset = load_dataset("json", data_files=DATA_FILE)["train"]
def is_short_enough(example):
ids = tokenizer(
tokenizer.apply_chat_template(example["messages"],
tokenize=False,
add_generation_prompt=False),
add_special_tokens=False,
)["input_ids"]
return len(ids) <= MAX_SEQ_LEN
dataset = dataset.filter(is_short_enough, num_proc=os.cpu_count())
tokenizer.pad_token = tokenizer.eos_token
def tokenize_function(example):
prompt = tokenizer.apply_chat_template(
example["messages"],
tokenize=False,
add_generation_prompt=False
)
return tokenizer(
prompt,
truncation=True,
padding="max_length",
max_length=MAX_SEQ_LEN,
return_tensors=None
)
tokenized_data = dataset.map(tokenize_function, num_proc=os.cpu_count(), remove_columns=dataset.column_names)
training_args = TrainingArguments(
output_dir=OUTPUT_DIR,
per_device_train_batch_size=1,
gradient_accumulation_steps=4,
num_train_epochs=15,
learning_rate=1e-4,
fp16=True,
logging_steps=10,
save_total_limit=1
)
data_collator = DataCollatorForLanguageModeling(tokenizer, mlm=False)
trainer = Trainer(
model=model,
args=training_args,
train_dataset=tokenized_data,
data_collator=data_collator
)
model.config.use_cache = False
trainer.train()
os.makedirs(OUTPUT_DIR, exist_ok=True)
model.save_pretrained(OUTPUT_DIR)
tokenizer.save_pretrained(OUTPUT_DIR)
print(f"Model and tokenizer saved at {OUTPUT_DIR}")
My data follow this pattern:
{
"messages": [
{"role": "system", "content": "..."},
{"role": "user", "content": "..."},
{"role": "assistant", "content": "{\"response\": \"...\"}"}
]
}
I took a tutorial with QLoRA/PEFT because if I got it well, it allows me to use less VRAM and as you could see, I have only 16GO. So my question is: is there a way to achieve what I want ? And if itβs the case what Iβm doing wrong ? (Itβs the first model I try to finetune)
Also this are versions for my libraries:
pip list | grep -E 'torch|transformers|accelerate|trl|datasets|bitsandbytes|peft|sentencepiece'
accelerate 1.7.0
bitsandbytes 0.45.5
datasets 3.6.0
fastrlock 0.8.3
peft 0.15.2
sentencepiece 0.2.0
torch 2.1.2+cu121
torchaudio 2.1.2
torchvision 0.16.2
transformers 4.51.3
trl 0.8.6
For now I have this OOM error when I run the code:
vllm_venv/lib/python3.10/site-packages/transformers/utils/hub.py:105: FutureWarning: Using `TRANSFORMERS_CACHE` is deprecated and will be removed in v5 of Transformers. Use `HF_HOME` instead.
warnings.warn(
Loading checkpoint shards: 100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 2/2 [00:11<00:00, 5.78s/it]
trainable params: 19,988,480 || all params: 6,760,501,248 || trainable%: 0.2957
No label_names provided for model class `PeftModel`. Since `PeftModel` hides base models input arguments, if label_names is not given, label_names can't be set automatically within `Trainer`. Note that empty label_names list will be used instead.
.......
[Traceback, no need I copy I think]
........
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 172.00 MiB. GPU 0 has a total capacty of 15.74 GiB of which 73.38 MiB is free. Including non-PyTorch memory, this process has 15.66 GiB memory in use. Of the allocated memory 15.01 GiB is allocated by PyTorch, and 465.39 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
0%| | 0/106605 [00:01<?, ?it/s]