IndexError: Invalid key: 16 is out of bounds for size 0

tried but gives this error
ā€œValueError: You should supply an encoding or a list of encodings to this method that includes input_ids, but you provided ā€
Has this ever happened to you?
If so, could you tell me if there is a specific formatting of the data in order to use this method?

1 Like

cc @ybelkada since itā€™s related to trl

Iā€™ve solved the erro and things are more interesting and stupid:
Only when I use PEFT-lora to warppering base-model ā€œgpt2ā€ thise error will be raised. and the error messages are as follows:

The following columns in the training set don't have a corresponding argument in `PeftModel.forward` and have been ignored: input_ids, labels. If input_ids, labels are not expected by `PeftModel.forward`,  you can safely ignore this message.
***** Running training *****
  Num examples = 0
  Num Epochs = 3
  Instantaneous batch size per device = 8
  Training with DataParallel so batch size has been adjusted to: 16
  Total train batch size (w. parallel, distributed & accumulation) = 16
  Gradient Accumulation steps = 1
  Total optimization steps = 9,198
  Number of trainable parameters = 1,179,648

Index error ................
.................

Where Num examples = 0 means thereā€™s nothing training data can be feed to model, or the Trainer just make a judgement that data the model refused is not the data we needed, then remove them(all of) out.

Hereā€™s my definition of my trainer and model, with my custom dataset which contains two features:{'input_ids': tensor, 'labels':tensor }:

training_args = TrainingArguments(
    "gpt2-lora-dp-trainer",
    per_device_train_batch_size=args['batch-size'],
    per_device_eval_batch_size=args['eval-batch-size'],
    num_train_epochs=args['train-epoch'],
    evaluation_strategy="epoch",
    remove_unused_columns=False,
    )
peft_config = LoraConfig(
            peft_type = TaskType.CAUSAL_LM,
            base_model_name_or_path = model_name_or_path,
            r = args['lora-r'],
            lora_alpha = args['lora-alpha'],
            lora_dropout = args['lora-dropout']
            )
model = AutoModelForCausalLM.from_pretrained(model_name_or_path)

model = get_peft_model(model, 
                       peft_config)
trainer = MyTrainer(
    model = model,
    data_collator = default_data_collator,
    train_dataset = valid_dataset,
    eval_dataset = valid_dataset,
    optimizers = (optimizer, lr_scheduler),
)

BUT, when I remove the peft model warpper, just use base model gpt-2 as model, the error message has changed to:

***** Running training *****
  Num examples = 49,043
  Num Epochs = 3
  Instantaneous batch size per device = 8
  Training with DataParallel so batch size has been adjusted to: 16
  Total train batch size (w. parallel, distributed & accumulation) = 16
  Gradient Accumulation steps = 1
  Total optimization steps = 9,198
  Number of trainable parameters = 124,439,808

OutOfMemoryError                          Traceback (most recent call last)
.........................

Which means Trainer just accept the gpt-2ā€™s model.forwad(**args) parameters protocal and refuse the PerfModel.forward() one.

These error still remains when I custom the trainer as:

# custom data feed method
class MyTrainer(Trainer):
    def compute_loss(self, model, inputs, return_outputs=False):
        return model(
            input_ids=inputs["input_ids"],
            labels=inputs["labels"],
        ).loss

I try to mannualy feeding the data and label to the PerfModel, with nothing happened, the Index error still raises.

BUT, I noticed that Iā€™ve forgot passing train args for Trainer. By fixing this stupid mistake, things back to the normal rail.

training_args = TrainingArguments(
    "gpt2-lora-dp-trainer",
    per_device_train_batch_size=args['batch-size'],
    per_device_eval_batch_size=args['eval-batch-size'],
    num_train_epochs=args['train-epoch'],
    evaluation_strategy="epoch",
    remove_unused_columns=False,
    )

trainer = MyTrainer(
    model = model,
    args = training_args,
    data_collator = default_data_collator,
    train_dataset = valid_dataset,
    eval_dataset = valid_dataset,
    optimizers = (optimizer, lr_scheduler),
)

Through the question is stupid, Itā€™s deserve noticing that the compatibility between Trainer and Peftmodel is still not good.

Hope my story can help for tracing this error.

Just add remove_unused_columns=False to TrainingArguments

5 Likes

Holy Gradient! After all this thread you provide simple solution! It works well in case of problem with LoRA!

1 Like

Great solution is ready

Hi sfalk,

The error ā€œIndexError: Invalid key: 16 is out of bounds for size 0ā€ suggests your NewDataset might be empty. The training loop tries to access the 16th element (index 15), but there are zero elements in the dataset.

Double-check your _generate_examples function to ensure itā€™s yielding data correctly. You can also try printing the number of examples generated during the initial creation to verify.

Here are some additional tips for avoiding caching issues:

Consider using yield from to stream data directly from your generation process instead of building the entire dataset in memory.
If you must cache, explore libraries like dask or ray for parallel processing and memory management.