ESM-2 QLoRA (gradient checkpointing not compatible?)

AmelieSchreiber · September 4, 2023, 3:28pm

I’ve recently wandered from PEFT LoRA fine-tuning of ESM-2 models such as facebook/esm2_t6_8M_UR50D into Quantized Low Rank Adaptation (QLoRA) fine-tuning. However, it seems as though ESM-2 models do not allow gradient checkpointing. Does anyone know a workaround? See this issue on Github for example: EsmForSequenceClassification does not support gradient checkpointing · Issue #606 · facebookresearch/esm · GitHub

simecek · October 17, 2023, 8:48pm

Dear Amelie,

Have you solved it? I might try to help but I want to check first that it is still an open problem.

AmelieSchreiber · October 17, 2023, 10:10pm

It seems to be mostly resolved, although there is still some issue with using biases. For example, if I have a LoRA config like the following, I cannot use all I can only use none:

    peft_config = LoraConfig(
        task_type=TaskType.TOKEN_CLS,
        inference_mode=False,
        r=config["r"],
        lora_alpha=config["lora_alpha"],
        target_modules=[
            "query",
            "key",
            "value",
            "EsmSelfOutput.dense",
            "EsmIntermediate.dense",
            "EsmOutput.dense",
            "EsmContactPredictionHead.regression",
            "classifier"
        ],
        lora_dropout=config["lora_dropout"],
        bias="none",  # or "all" or "lora_only"
        # modules_to_save=["classifier"]
    )

Otherwise it seems to be working fine now. It wouldn’t hurt to have a second pair of eyes on it just to make sure everything is working properly.

Topic		Replies	Views
LoRA finetuning without quantization (8bit) 🤗Transformers	1	991	February 23, 2024
LoRA vs QLoRA finetuning performance on llama2 🤗Transformers	0	2884	September 4, 2023
Fine tuning for Llama2 based model with LoftQ quantization 🤗Transformers	7	2399	January 24, 2024
Memory consumption qlora with gradient checkpointing 🤗Transformers	0	447	January 28, 2024
How to combine LoRA and gradient_checkpointing in Whisper? 🤗Transformers	1	5003	August 17, 2023

ESM-2 QLoRA (gradient checkpointing not compatible?)

Related topics