I am trying to do the above starting with llama2-7b but have been running into ugly looking errors, which I have not been able to resolve by searching. There are probably too many areas where I am filling in the blanks (aka guessing) to hit on the solution by chance, so hence the post.
import os
import torch
from datasets import Dataset
from awq import AutoAWQForCausalLM
from transformers import (
AutoModelForCausalLM,
AutoTokenizer,
AwqConfig,
BitsAndBytesConfig,
TrainingArguments,
pipeline,
logging,
)
from peft import LoraConfig, PeftModel
from trl import SFTTrainer
# Model from Hugging Face hub
base_model = "NousResearch/Llama-2-7b-chat-hf"
dataset = Dataset.from_dict({
"label": [ <list> ],
"text": [ <list> ]
});
# compute_dtype = getattr(torch, "float16")
# quant_config = BitsAndBytesConfig(
# load_in_4bit=True,
# bnb_4bit_quant_type="nf4",
# bnb_4bit_compute_dtype=compute_dtype,
# bnb_4bit_use_double_quant=False,
# )
model = AutoModelForCausalLM.from_pretrained(
base_model,
# quantization_config=quant_config,
device_map={"": 0}
)
model.config.use_cache = False
model.config.pretraining_tp = 1
tokenizer = AutoTokenizer.from_pretrained(base_model, trust_remote_code=True)
tokenizer.pad_token = tokenizer.eos_token
tokenizer.padding_side = "right"
peft_params = LoraConfig(
lora_alpha=16,
lora_dropout=0.1,
r=64,
bias="none",
task_type="CAUSAL_LM",
)
training_params = TrainingArguments(
output_dir="./results",
num_train_epochs=1,
per_device_train_batch_size=4,
gradient_accumulation_steps=1,
optim="paged_adamw_32bit",
save_steps=25,
logging_steps=25,
learning_rate=2e-4,
weight_decay=0.001,
fp16=False,
bf16=False,
max_grad_norm=0.3,
max_steps=-1,
warmup_ratio=0.03,
group_by_length=True,
lr_scheduler_type="constant",
report_to="tensorboard"
)
trainer = SFTTrainer(
model=model,
train_dataset=dataset,
peft_config=peft_params,
dataset_text_field="text",
max_seq_length=None,
tokenizer=tokenizer,
args=training_params,
packing=False,
)
trainer.model.save_pretrained('lora')
trainer.tokenizer.save_pretrained('tokenizer')
peft_model = PeftModel.from_pretrained(
model,
'lora'
)
peft_model.merge_and_unload()
peft_model.save_pretrained('merged_model')
model = AutoAWQForCausalLM.from_pretrained( 'merged_model', device_map='auto' )
quant_name = 'merged-model-AWQ'
quant_config = {
"zero_point": True,
"q_group_size": 128,
"w_bit": 4
}
model.quantize( tokenizer, quant_config = quant_config )
model.save_quantized( quant_name, safe_tensors=True )
tokenizer.save_pretrained( quant_name )
(disclaimer - I adapted the above slightly to make it easier to present and didn’t test the adapted version, so there may be typos).
I confess I obtained params from examples and have not exhaustively been through them - so that could be an issue. However, right now the objective is simply to get through the process without it ending in a horrible error - so I suspect its something more basic that’s likely the problem.
Re. quantization_config
- I’ve shown this commented as I’ve tried it both with and without. I am not sure if loading in the model 4 bit would result in a quantized output when saved (or perhaps it just affects precision?). Not really sure about this part, but it doesn’t seem to make any difference to the result either way.
So the adapter seems to save successfully in the lora
directory. However merged_model
seems to look like the lora
adapter directory, except it has an apparently empty adapter_model.safetensors
. The program falls over saying
SError: merged_model does not appear to have a file named config.json
It doesn’t look like the merge worked, so I am guessing this is the problem I need to resolve.
I did try copying in the config.json
from the original model - which was suggested somewhere. Then I get down to:
safetensors_rust.SafetensorError: Error while deserializing header: InvalidHeaderDeserialization
but I am guessing this is all a bit pointless anyway since merged_model
is empty. I ran into a comment which suggested that merge_and_unload()
does not work with safetensors
and so the finetuned model should be saved in .bin
format - however, I am not sure how to get it to output this format.
Lost in many areas! Any help would be appreciated.