LoRA training with accelerate / deepspeed

iarbel · January 22, 2024, 11:54am

I’m training LoRA adapters large models using Accelerate + DeepSpeed, with ZeRO-3. I’m using alignment-handbook implementation for that. However, when saving checkpoints, the full model is being saved, while I need only the adapter (and possibly the optimizer states). Is this possible to configure the training such that save only the adapters are saved?

Or should I used a MULTI_GPU setup instead of DeepSpeed?

Model to be trained is Llama-70B, on 8*A100 40GB, so it’s really not possible to fit the model on a single GPU.

blazejm · August 8, 2024, 7:50am

hey, did you figure it out?

iarbel · August 8, 2024, 9:20am

Use multi-GPU and not DeepSpeed

MauroExtrac · May 28, 2025, 12:30pm

@iarbel What do you mean by using multi GPU rather than DeepSpeed?

Topic		Replies	Views
DeepSpeed Zero 3 with LoRA - Merging adapters DeepSpeed	1	678	August 16, 2024
Issue with LoRA Adapter Loading on Multiple GPUs during Fine-Tuning with Accelerate and SFTTrainer 🤗Accelerate	3	1020	September 18, 2024
Saving unique weights while training on multiple GPU - Trainer 🤗Transformers	0	257	January 25, 2024
Multi GPU training - Model parallelism DeepSpeed	1	1889	February 2, 2024
Train LoRA adapters on Multiple Datasets in Parallel for llama7B 🤗Transformers	0	965	November 1, 2023

LoRA training with accelerate / deepspeed

Related topics