IndexError: Invalid key: 16 is out of bounds for size 0

I’ve solved the erro and things are more interesting and stupid:
Only when I use PEFT-lora to warppering base-model “gpt2” thise error will be raised. and the error messages are as follows:

The following columns in the training set don't have a corresponding argument in `PeftModel.forward` and have been ignored: input_ids, labels. If input_ids, labels are not expected by `PeftModel.forward`,  you can safely ignore this message.
***** Running training *****
  Num examples = 0
  Num Epochs = 3
  Instantaneous batch size per device = 8
  Training with DataParallel so batch size has been adjusted to: 16
  Total train batch size (w. parallel, distributed & accumulation) = 16
  Gradient Accumulation steps = 1
  Total optimization steps = 9,198
  Number of trainable parameters = 1,179,648

Index error ................
.................

Where Num examples = 0 means there’s nothing training data can be feed to model, or the Trainer just make a judgement that data the model refused is not the data we needed, then remove them(all of) out.

Here’s my definition of my trainer and model, with my custom dataset which contains two features:{'input_ids': tensor, 'labels':tensor }:

training_args = TrainingArguments(
    "gpt2-lora-dp-trainer",
    per_device_train_batch_size=args['batch-size'],
    per_device_eval_batch_size=args['eval-batch-size'],
    num_train_epochs=args['train-epoch'],
    evaluation_strategy="epoch",
    remove_unused_columns=False,
    )
peft_config = LoraConfig(
            peft_type = TaskType.CAUSAL_LM,
            base_model_name_or_path = model_name_or_path,
            r = args['lora-r'],
            lora_alpha = args['lora-alpha'],
            lora_dropout = args['lora-dropout']
            )
model = AutoModelForCausalLM.from_pretrained(model_name_or_path)

model = get_peft_model(model, 
                       peft_config)
trainer = MyTrainer(
    model = model,
    data_collator = default_data_collator,
    train_dataset = valid_dataset,
    eval_dataset = valid_dataset,
    optimizers = (optimizer, lr_scheduler),
)

BUT, when I remove the peft model warpper, just use base model gpt-2 as model, the error message has changed to:

***** Running training *****
  Num examples = 49,043
  Num Epochs = 3
  Instantaneous batch size per device = 8
  Training with DataParallel so batch size has been adjusted to: 16
  Total train batch size (w. parallel, distributed & accumulation) = 16
  Gradient Accumulation steps = 1
  Total optimization steps = 9,198
  Number of trainable parameters = 124,439,808

OutOfMemoryError                          Traceback (most recent call last)
.........................

Which means Trainer just accept the gpt-2’s model.forwad(**args) parameters protocal and refuse the PerfModel.forward() one.

These error still remains when I custom the trainer as:

# custom data feed method
class MyTrainer(Trainer):
    def compute_loss(self, model, inputs, return_outputs=False):
        return model(
            input_ids=inputs["input_ids"],
            labels=inputs["labels"],
        ).loss

I try to mannualy feeding the data and label to the PerfModel, with nothing happened, the Index error still raises.

BUT, I noticed that I’ve forgot passing train args for Trainer. By fixing this stupid mistake, things back to the normal rail.

training_args = TrainingArguments(
    "gpt2-lora-dp-trainer",
    per_device_train_batch_size=args['batch-size'],
    per_device_eval_batch_size=args['eval-batch-size'],
    num_train_epochs=args['train-epoch'],
    evaluation_strategy="epoch",
    remove_unused_columns=False,
    )

trainer = MyTrainer(
    model = model,
    args = training_args,
    data_collator = default_data_collator,
    train_dataset = valid_dataset,
    eval_dataset = valid_dataset,
    optimizers = (optimizer, lr_scheduler),
)

Through the question is stupid, It’s deserve noticing that the compatibility between Trainer and Peftmodel is still not good.

Hope my story can help for tracing this error.