ESM-2 QLoRA (gradient checkpointing not compatible?)

I’ve recently wandered from PEFT LoRA fine-tuning of ESM-2 models such as facebook/esm2_t6_8M_UR50D into Quantized Low Rank Adaptation (QLoRA) fine-tuning. However, it seems as though ESM-2 models do not allow gradient checkpointing. Does anyone know a workaround? See this issue on Github for example: EsmForSequenceClassification does not support gradient checkpointing · Issue #606 · facebookresearch/esm · GitHub

1 Like

Dear Amelie,

Have you solved it? I might try to help but I want to check first that it is still an open problem.

It seems to be mostly resolved, although there is still some issue with using biases. For example, if I have a LoRA config like the following, I cannot use all I can only use none:

    peft_config = LoraConfig(
        task_type=TaskType.TOKEN_CLS,
        inference_mode=False,
        r=config["r"],
        lora_alpha=config["lora_alpha"],
        target_modules=[
            "query",
            "key",
            "value",
            "EsmSelfOutput.dense",
            "EsmIntermediate.dense",
            "EsmOutput.dense",
            "EsmContactPredictionHead.regression",
            "classifier"
        ],
        lora_dropout=config["lora_dropout"],
        bias="none",  # or "all" or "lora_only"
        # modules_to_save=["classifier"]
    )

Otherwise it seems to be working fine now. It wouldn’t hurt to have a second pair of eyes on it just to make sure everything is working properly.