Cannot use apply_chat_template() because tokenizer.chat_template is not set

Hi I am new to hugging face. I have a problem with this code. I am trying to fine-tune my model on alpaca dataset “s3nh/alpaca-dolly-instruction-only-polish”. I am using Qlora 4bit quantization and libraries version:

accelerate==0.33.0
transformers==4.44.2
datasets==2.21.0
peft==0.12.0
trl==0.10.1

I am using this model which is based on Llama 2

model_name = "OPI-PG/Qra-7b"

original_tokenizer = AutoTokenizer.from_pretrained(
    model_name,
    use_auth_token=os.getenv("HF_TOKEN")
)
# Tu niby jakies problemy z tokenizerem, ale chyba nie powinny mocna wplywac na jakos
original_tokenizer.pad_token = original_tokenizer.eos_token
# tokenizer.padding_side = "right" # Fix weird overflow issue with fp16 training


original_model = AutoModelForCausalLM.from_pretrained(
    model_name,
    quantization_config=nf4_config,
    device_map="auto",
    use_auth_token=os.getenv("HF_TOKEN")
)

I created formatting function and mapped dataset already to conversational format:

system_message = """Jesteś przyjaznym chatbotem"""

def create_conversation(sample) -> dict:
    strip_characters = "\"'"
    return {
        "messages": [
            {"role": "system", "content": system_message},
            {"role": "user",
             "content": f"{sample['instruction'].strip(strip_characters)} "
                        f"{sample['input'].strip(strip_characters)}"},
            {"role": "assistant",
             "content": f"{sample['output'].strip(strip_characters)}"}
        ]
}

dataset_mapped = dataset.map(create_conversation, remove_columns=dataset.features)
trainer = SFTTrainer(
    model=peft_model,
    train_dataset=dataset_mapped,
    peft_config=peft_config,
    packing = False,
    tokenizer = tokenizer,
    args=args,
)

ValueError: Cannot use apply_chat_template() because tokenizer.chat_template is not set and no template argument was passed! For information about writing templates and setting the tokenizer.chat_template attribute, please see the documentation at Chat Templates

Should not this chat_template be inffered from dataset I have provided? It;s already in the conversation format

Hi,

I don’t find OPI-PG/Qra-7b on the hub, can you link the model you’d like to fine-tune?

The error is raised because there’s no chat_template key present in the tokenizer_config.json.

model is here OPI-PG/Qra-7b · Hugging Face. But thanks. I have set tokenizer.chat_template = “<prompt_template>” and looks like it works. Training works.

Yet, I have another problem. what’s the correlation between per_device_train_batch_size and gradient_accumulation_steps. From what I see in docs, the effective batch size is per_device_train_batch_size * gradient_accumulation_steps * n_of_gpus.

I am using free version of google colab with Nvida T4 GPU and 15 GB of GPU RAM.

The problem is why setting

 per_device_train_batch_size=1,
 gradient_accumulation_steps=4,

it faster than setting this:

per_device_train_batch_size=8,
gradient_accumulation_steps=2

should not 16 effective batch size process data faster than 4 effective batch size? When I was looking at progress bar after 10 minutes of first configuration it showed that there is 50 hours of training left. For the second one it’s like 100hours+. Is this a bug, or could be something with not much memory left?

Using first config uses 7 GB / 15 GB RAM, second uses 14.5 GB / 15 GM RAM. Here is my TrainingArguments instance:

args = TrainingArguments(
    output_dir="Qra-7b-dolly-instruction-qlora-0.1",
    num_train_epochs=1,
    per_device_train_batch_size=8,
    gradient_accumulation_steps=2, 
    gradient_checkpointing=False,
    optim="adamw_torch_fused",  
    logging_steps=100,
    save_strategy="epoch",
    learning_rate=2e-4,
    bf16=True, 
    tf32=False, 
    max_grad_norm=0.3,
    warmup_ratio=0.03,
    lr_scheduler_type="constant", 
    push_to_hub=False,
    # report_to=["tensorboard"],
)

Increasing the batch size should decrease the training time indeed. However looking at s3nh/alpaca-dolly-instruction-only-polish · Datasets at Hugging Face, which only consists of 23k rows, it shouldn’t take that long for a single epoch.

Can you confirm the GPU is being used (for instance by checking nvidia-smi in the terminal)?

Yes I am using colab testla t4:

Mon Sep  9 11:20:48 2024       
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.104.05             Driver Version: 535.104.05   CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  Tesla T4                       Off | 00000000:00:04.0 Off |                    0 |
| N/A   75C    P0              31W /  70W |   5331MiB / 15360MiB |      3%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
                                                                                         
+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
+---------------------------------------------------------------------------------------+

hello,recently i want to use opencompass to evaluate TIGER-Lab/MAmmoTH-7B(a model based on llama2-7B base), i overcome the same error.I use the tokenizer.chat_template = “<prompt_template>” ,but the model outputs are all \n Please provide a value for the variable\n <input type="text" name="variable" placeholder="Please provide a value for the variable" required>\n \n 1\n 2\n 3\n 4\n \n <result name="variable" id="1" value="1" />\n <result name="variable" id="2" value="2" />\n <result name="variable" id="3" value="3" />\n <result name="variable" id="4" value="4" />\n 1\n 2\n 3\n 4\n 1\n 2\n 3\n 4\n 1\n 2\n 3\n 4\n</prompt_template>\n\n\nTo use this prompt, you need to define a variable named `variable` with a type of `text`. The prompt will then appear as shown below:\n\npython\nvariable = 1\n```\n\nYou can also use the suggest and result tags to provide a list of possible values and the selected value, respectively. For example, you can add a suggest tag with values 1, 2, 3, and 4 and a result tag with a value of 4. This will result in the prompt appearing as follows:\n\