Finetuning Starcoder model with Peft-Lora, Qlora and deepspeed

Satya4093 · July 12, 2023, 3:19pm

I Tried Qlora it is working fine for Starcoder model with small context length 1K on a single A100 40GB GPU. with int4

but i want to finetune with 8K context length. even if i specify more gpus its i am not able to push the context length to 8K.

i tried device_map = ‘auto’ that didn’t work fine so i tried

device_map = {
‘transformer.wte’: 0,
‘transformer.wpe’: 0,
‘transformer.drop’: 0,
‘transformer.h.0’: 0,
‘transformer.h.1’: 0,
‘transformer.h.2’: 1,
‘transformer.h.3’: 1,
‘transformer.h.4’: 1,
‘transformer.h.5’: 1,
‘transformer.h.6’: 1,
‘transformer.h.7’: 1,
‘transformer.h.8’: 1,
‘transformer.h.9’: 1,
‘transformer.h.10’: 2,
‘transformer.h.11’: 2,
‘transformer.h.12’: 2,
‘transformer.h.13’: 2,
‘transformer.h.14’: 2,
‘transformer.h.15’: 2,
‘transformer.h.16’: 2,
‘transformer.h.17’: 3,
‘transformer.h.18’: 3,
‘transformer.h.19’: 3,
‘transformer.h.20’: 3,
‘transformer.h.21’: 3,
‘transformer.h.22’: 3,
‘transformer.h.23’: 3,
‘transformer.h.24’: 3,
‘transformer.h.25’: 4,
‘transformer.h.26’: 4,
‘transformer.h.27’: 4,
‘transformer.h.28’: 4,
‘transformer.h.29’: 4,
‘transformer.h.30’: 4,
‘transformer.h.31’: 4,
‘transformer.h.32’: 4,
‘transformer.h.33’: 5,
‘transformer.h.34’: 5,
‘transformer.h.35’: 5,
‘transformer.h.36’: 5,
‘transformer.h.37’: 5,
‘transformer.h.38’: 5,
‘transformer.h.39’: 5,
‘transformer.ln_f’: 5,
‘lm_head’:0
}

this is for 6 gpus i was able to train 6K context length. but not 8K .
why is memory usage increasing rapidly with increase in context length?

i wanted to try cpu offloading from deepspeed and FSDP but when i try its not working with quantization.

Is it possible to train model using deepspeed or FSDP with quantization or not?
what am i doing wrong?

can someone help me in this? Suggestions are greatly appreciated.

Topic		Replies	Views
FineTuning 7B model on 3080 laptop (16GO VRAM) issues Beginners	1	44	May 16, 2025
Model is getting loaded unevenly on GPUs 🤗Transformers	1	50	July 11, 2024
Finetuning LLama2-70B using 4-bit quantization on multi-GPU using Deepspeed ZeRO Intermediate	1	2416	March 19, 2024
Is it possible to finetune *ForQA models with SFT (PEFT/QLoRA)? Beginners	2	561	January 7, 2024
Multi-gpu batch processing fails when using Peft Lora with Huggingface Intermediate	1	1285	March 8, 2024

Finetuning Starcoder model with Peft-Lora, Qlora and deepspeed

Related topics