Llama2 fine-tunning with PEFT QLora and testing the model

hi All, @philschmid , I hope you are doing well. Sorry for fine tuning llama2, I create csv file with the Alpaca structure which has text column including ### instruction ### input ### response, for fine tuning the model I am confused which method with PEFT and QLora should I use, I am confused with many codes, would you please refer me to any code that is right for fine tuning with alpaca structure, and saving and inference for testing the model? In some code I saw they did tokenizer truncate and padding and refer label to -100 and in other no preprocessing is done. I appreciate your help. Many thanks.

1 Like

I suggest using llama recipes repo from meta.

I’d recommend checking out the official example scripts:

2 Likes

@nielsr all the examples seem to load a model into a single GPU before fine-tuning it. I found that using Deepspeed only the training is done in parallel, but the initial model needs to first fit to the single GPU. Any examples you are aware of that show how to fine tune a model that is loaded across GPUs? Thanks.

@nielsr , many thanks for your help and tips. sorry, I can’t load my data in hugging face can I read it directly in the code from local CSV file? including one column names “text” which is concatenation of ### Human: . ### Assistant? for inference is ther available code ?

@aytugkaya , have you used multiple GPUs? is it possible to share your code with me if it is multiple GPUS?

No, single Gpu only.

Hello! I initiated a model and tokenization to train on a T4 GPU using:
//
model_id=“meta-llama/Llama-2-7b-hf”
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id,load_in_8bit=True, device_map=‘auto’, torch_dtype=torch.float16)
//
The configuration I used:
//
lora_config=LoraConfig(
task_type=TaskType.CAUSAL_LM,
inference_mode=False,
r=8,
lora_alpha=32,
lora_dropout=0.05,
target_modules = [“q_proj”, “v_proj”]
)
config = {
‘lora_config’: lora_config,
‘learning_rate’: 1e-4,
‘num_train_epochs’: 1,
‘gradient_accumulation_steps’: 4,
‘per_device_train_batch_size’: 1
}
training_args = TrainingArguments(
output_dir=output_dir,
overwrite_output_dir=True,
fp16=True, # Use BF16 if available
# logging strategies
logging_dir=f"{output_dir}/logs",
logging_strategy=“steps”,
logging_steps=10,
save_strategy=“no”,
optim=“adamw_torch_fused”,
max_steps=total_steps if enable_profiler else -1,
**{k:v for k,v in config.items() if k != ‘lora_config’}
)
model.save_pretrained(“/content/drive/MyDrive/Colab Notebooks/llama2/saved_model”)
//
To load the trained model and use it, I did:
//
model = LlamaForCausalLM.from_pretrained(
model_id,
return_dict=True,
load_in_8bit=True,
device_map=“auto”,
low_cpu_mem_usage=True,
)
model = PeftModel.from_pretrained(model, peft_model,is_trainable=True, torch_dtype=torch.float16)
//
The inference is working and i don´t want to load checkpoint.
When I try to do new fine-tuning with the trained model, I encounter this error:
//
OutOfMemoryError: CUDA out of memory. Tried to allocate 256.00 MiB (GPU 0; 14.75 GiB total capacity; 13.34 GiB already allocated; 128.81 MiB free; 13.56 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
//
How can i do it?

I’ve got it now!
The first time I fine-tuned, I did:
//
model = prepare_model_for_int8_training(model)
model = get_peft_model(model, peft_config)
//
When I loaded the model to continue training::
//
peft_model=“/content/drive/MyDrive/Colab Notebooks/llama2/saved_model2”
model_id=“meta-llama/Llama-2-7b-hf”
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = LlamaForCausalLM.from_pretrained(
model_id,
return_dict=True,
load_in_8bit=True,
device_map=“auto”,
low_cpu_mem_usage=True,
)
model = prepare_model_for_int8_training(model) # <<<<---- I forgot this part
model = PeftModel.from_pretrained(model, peft_model,is_trainable=True, torch_dtype=torch.float16)
//

hello,@dametodata are you using multiple gpus? or one. I am aftering a code which work with multiple defined gpus

Hello! I used one!