There must be something wrong with my code as the loss is 0.0 at epoch 0. I think there might be an issue with my dataset or the loss calculation logic. I am entirely new to the LLM field. Is there anyone who could point out the error
my dataset
max_length = 256
dataset = load_dataset('tatsu-lab/alpaca').map(
lambda elem: {
"input_ids": tokeniser.encode(
elem["instruction"],
padding="max_length",
truncation=True,
max_length=max_length
),
"label_ids": tokeniser.encode(
elem["text"],
padding="max_length",
truncation=True,
max_length=max_length
),
# "label": elem["output"],
}
)
The training code
trainer = Seq2SeqTrainer(
model=model,
train_dataset=dataset['train'],
# eval_dataset=dataset['test'],
args=training_args,
# data_collator=data_collator,
)
trainer.train()
I also tried a modified Trainer but the loss is still 0.0
My trainer:
class ModifiedTrainer(Seq2SeqTrainer):
def compute_loss(self, model, inputs, return_outputs=False):
return model(
input_ids=inputs["input_ids"],
attention_mask=torch.ones_like(inputs["input_ids"]).bool(),
labels=inputs["labels"],
).loss