Fine-tuning Llama-7B

sp4912 · May 2, 2023, 5:32pm

I am trying to fine-tune Llama-7B on a dataset of batch size 1 (so data is not the issue memory wise). I am using deepspeed with huggingface trainer. The issue I am facing is that deepspeed doesn’t let me set the model to a device separately (it does so automatically, causing OOM errors), but as a consequence, executing non-training/forward passes through the model/obtain logits take forever as they run on the CPU.

So what I am trying to do is withhold a GPU so that the model can execute forward calls on that GPU, after allowing deepspeed to modify it.

I have tried using deepspeed --include localhost:(GPUs I want deepspeed to use), but this sets CUDA_VISIBLE_DEVICES to exclude the withheld GPU I want to use, and deepspeed automatically uses all GPUs in --include. Is there any way I can solve this?

abdoelsayed · May 2, 2023, 6:07pm

I had the same problem, but I solved it by using this rep. GitHub - mallorbc/Finetune_LLMs: Repo for fine-tuning GPTJ and other GPT models
You can listen to the description of how to use it on YouTube. How To Fine-tune The LLaMA Models(GPT3 Alternative) - YouTube

abdoelsayed · May 2, 2023, 6:09pm

Also, try to reduce the block size from 1024 to 128 and use batch size 64

Topic		Replies	Views
Finetune LLM with DeepSpeed DeepSpeed	2	5122	February 22, 2024
Finetuning LLama2-70B using 4-bit quantization on multi-GPU using Deepspeed ZeRO Intermediate	1	2422	March 19, 2024
Model Parallism DeepSpeed	0	184	April 21, 2024
Deepspeed ZeRO-3 flattens convolution that causes runtime error DeepSpeed	0	145	February 17, 2025
Unable to load a FineTuned LLama Model to GPU for inference Beginners	3	2974	December 15, 2023

Fine-tuning Llama-7B

Related topics