I’ve recently wandered from PEFT LoRA fine-tuning of ESM-2 models such as facebook/esm2_t6_8M_UR50D
into Quantized Low Rank Adaptation (QLoRA) fine-tuning. However, it seems as though ESM-2 models do not allow gradient checkpointing. Does anyone know a workaround? See this issue on Github for example: EsmForSequenceClassification does not support gradient checkpointing · Issue #606 · facebookresearch/esm · GitHub