IndexError: Invalid key: 16 is out of bounds for size 0

tried but gives this error
ā€œValueError: You should supply an encoding or a list of encodings to this method that includes input_ids, but you provided ā€
Has this ever happened to you?
If so, could you tell me if there is a specific formatting of the data in order to use this method?

1 Like

cc @ybelkada since itā€™s related to trl

Iā€™ve solved the erro and things are more interesting and stupid:
Only when I use PEFT-lora to warppering base-model ā€œgpt2ā€ thise error will be raised. and the error messages are as follows:

The following columns in the training set don't have a corresponding argument in `PeftModel.forward` and have been ignored: input_ids, labels. If input_ids, labels are not expected by `PeftModel.forward`,  you can safely ignore this message.
***** Running training *****
  Num examples = 0
  Num Epochs = 3
  Instantaneous batch size per device = 8
  Training with DataParallel so batch size has been adjusted to: 16
  Total train batch size (w. parallel, distributed & accumulation) = 16
  Gradient Accumulation steps = 1
  Total optimization steps = 9,198
  Number of trainable parameters = 1,179,648

Index error ................
.................

Where Num examples = 0 means thereā€™s nothing training data can be feed to model, or the Trainer just make a judgement that data the model refused is not the data we needed, then remove them(all of) out.

Hereā€™s my definition of my trainer and model, with my custom dataset which contains two features:{'input_ids': tensor, 'labels':tensor }:

training_args = TrainingArguments(
    "gpt2-lora-dp-trainer",
    per_device_train_batch_size=args['batch-size'],
    per_device_eval_batch_size=args['eval-batch-size'],
    num_train_epochs=args['train-epoch'],
    evaluation_strategy="epoch",
    remove_unused_columns=False,
    )
peft_config = LoraConfig(
            peft_type = TaskType.CAUSAL_LM,
            base_model_name_or_path = model_name_or_path,
            r = args['lora-r'],
            lora_alpha = args['lora-alpha'],
            lora_dropout = args['lora-dropout']
            )
model = AutoModelForCausalLM.from_pretrained(model_name_or_path)

model = get_peft_model(model, 
                       peft_config)
trainer = MyTrainer(
    model = model,
    data_collator = default_data_collator,
    train_dataset = valid_dataset,
    eval_dataset = valid_dataset,
    optimizers = (optimizer, lr_scheduler),
)

BUT, when I remove the peft model warpper, just use base model gpt-2 as model, the error message has changed to:

***** Running training *****
  Num examples = 49,043
  Num Epochs = 3
  Instantaneous batch size per device = 8
  Training with DataParallel so batch size has been adjusted to: 16
  Total train batch size (w. parallel, distributed & accumulation) = 16
  Gradient Accumulation steps = 1
  Total optimization steps = 9,198
  Number of trainable parameters = 124,439,808

OutOfMemoryError                          Traceback (most recent call last)
.........................

Which means Trainer just accept the gpt-2ā€™s model.forwad(**args) parameters protocal and refuse the PerfModel.forward() one.

These error still remains when I custom the trainer as:

# custom data feed method
class MyTrainer(Trainer):
    def compute_loss(self, model, inputs, return_outputs=False):
        return model(
            input_ids=inputs["input_ids"],
            labels=inputs["labels"],
        ).loss

I try to mannualy feeding the data and label to the PerfModel, with nothing happened, the Index error still raises.

BUT, I noticed that Iā€™ve forgot passing train args for Trainer. By fixing this stupid mistake, things back to the normal rail.

training_args = TrainingArguments(
    "gpt2-lora-dp-trainer",
    per_device_train_batch_size=args['batch-size'],
    per_device_eval_batch_size=args['eval-batch-size'],
    num_train_epochs=args['train-epoch'],
    evaluation_strategy="epoch",
    remove_unused_columns=False,
    )

trainer = MyTrainer(
    model = model,
    args = training_args,
    data_collator = default_data_collator,
    train_dataset = valid_dataset,
    eval_dataset = valid_dataset,
    optimizers = (optimizer, lr_scheduler),
)

Through the question is stupid, Itā€™s deserve noticing that the compatibility between Trainer and Peftmodel is still not good.

Hope my story can help for tracing this error.

Just add remove_unused_columns=False to TrainingArguments

5 Likes

Holy Gradient! After all this thread you provide simple solution! It works well in case of problem with LoRA!