I’m training LoRA adapters large models using Accelerate + DeepSpeed, with ZeRO-3. I’m using alignment-handbook implementation for that. However, when saving checkpoints, the full model is being saved, while I need only the adapter (and possibly the optimizer states). Is this possible to configure the training such that save only the adapters are saved?
Or should I used a MULTI_GPU setup instead of DeepSpeed?
Model to be trained is Llama-70B, on 8*A100 40GB, so it’s really not possible to fit the model on a single GPU.