Hi I am new to hugging face. I have a problem with this code. I am trying to fine-tune my model on alpaca dataset “s3nh/alpaca-dolly-instruction-only-polish”. I am using Qlora 4bit quantization and libraries version:
accelerate==0.33.0
transformers==4.44.2
datasets==2.21.0
peft==0.12.0
trl==0.10.1
I am using this model which is based on Llama 2
model_name = "OPI-PG/Qra-7b"
original_tokenizer = AutoTokenizer.from_pretrained(
model_name,
use_auth_token=os.getenv("HF_TOKEN")
)
# Tu niby jakies problemy z tokenizerem, ale chyba nie powinny mocna wplywac na jakos
original_tokenizer.pad_token = original_tokenizer.eos_token
# tokenizer.padding_side = "right" # Fix weird overflow issue with fp16 training
original_model = AutoModelForCausalLM.from_pretrained(
model_name,
quantization_config=nf4_config,
device_map="auto",
use_auth_token=os.getenv("HF_TOKEN")
)
I created formatting function and mapped dataset already to conversational format:
system_message = """Jesteś przyjaznym chatbotem"""
def create_conversation(sample) -> dict:
strip_characters = "\"'"
return {
"messages": [
{"role": "system", "content": system_message},
{"role": "user",
"content": f"{sample['instruction'].strip(strip_characters)} "
f"{sample['input'].strip(strip_characters)}"},
{"role": "assistant",
"content": f"{sample['output'].strip(strip_characters)}"}
]
}
dataset_mapped = dataset.map(create_conversation, remove_columns=dataset.features)
trainer = SFTTrainer(
model=peft_model,
train_dataset=dataset_mapped,
peft_config=peft_config,
packing = False,
tokenizer = tokenizer,
args=args,
)
ValueError: Cannot use apply_chat_template() because tokenizer.chat_template is not set and no template argument was passed! For information about writing templates and setting the tokenizer.chat_template attribute, please see the documentation at Chat Templates
Should not this chat_template be inffered from dataset I have provided? It;s already in the conversation format