PEFT fine-tuning as slow as full model fine-tuning

Hello there,

I am trying to fine-tune an XLM-R model using PEFT, but it seems the training speed is slower using PEFT than just fine-tuning the full model. Is it the expected behaviour?

Code sample

    dataset = load_dataset(dataset_name)

    device = "cuda:0" if (torch.cuda.is_available() and not args.cpu) else "cpu"
    print(f"Training on {device} {torch.cuda.is_available()}")

    tokenizer = AutoTokenizer.from_pretrained(checkpoint)
    model = AutoModelForSequenceClassification.from_pretrained(
        checkpoint,
        problem_type="regression",
        num_labels=len(CATEGORIES),
    )

    max_length = (
        tokenizer.max_model_input_sizes["xlm-roberta-base"]
        if "xlm" in checkpoint.lower()
        else model.max_length
    )

    print(f"Truncating to {max_length} tokens")
    
    tokenized_datasets = dataset.map(
        lambda samples: tokenizer(
            samples["text"],
            padding="longest",
            return_tensors="pt",
        ),
        batched=True,
        batch_size=512,
        num_proc=12,
    )

    # Evaluate every 20% of training set.
    steps_by_evaluation = int(
        dataset["train"].shape[0] / config["training_batch_size"] / 5
    )
    print(f"Evaluating every {steps_by_evaluation}")
    
    # Lora part
    model.gradient_checkpointing_enable()
    model = prepare_model_for_kbit_training(model)
    
    lora_config = LoraConfig(
        r=8,
        lora_alpha=32,
        inference_mode=False,
        lora_dropout=0.05,
        bias="none",
        task_type="SEQ_CLS",
    )

    model = get_peft_model(model, lora_config).to(device)
    print(model.print_trainable_parameters())

    training_args = TrainingArguments(
        evaluation_strategy="steps",
        save_strategy="steps",
        eval_steps=steps_by_evaluation,
        save_steps=steps_by_evaluation,
        learning_rate=2e-5,
        per_device_train_batch_size=8,
        per_device_eval_batch_size=8,
        num_train_epochs=3,
        weight_decay=1,
        warmup_ratio=0.1,
        seed=42,
    )
    
    trainer = MultilabelTrainer(
        model,
        training_args,
        train_dataset=tokenized_datasets["train"],
        eval_dataset=tokenized_datasets["validation"],
        tokenizer=tokenizer,
    )
   
    trainer.train()

While I am here, I have a second question. Following the example notebook using IA3 instead of Lora, I get the following error:

ia3 ValueError: Please specify `target_modules` in `peft_config`

I can’t find anywhere in the doc an example for IA3 config using target_modules. Moreover, the example notebook doesn’t seem to work…

Thank you in advance for your time.

1 Like

Perhaps an element of response. In the original paper:

 We also observe a 25% speedup
during training on GPT-3 175B compared to full fine-tuning5
as we do not need to calculate the
gradient for the vast majority of the parameters

I suppose for a much smaller model, due to the nature of the implementation, the training could be slightly slower.

How much data are you using to fine-tune the model?
Also, what type of setup are you training on? A personal computer, gpu cluster in the cloud, etc.?

Working on an A10 GPU, with around 10M text samples.