Fine Tuning LLama 3.2 1B Quantized Memory Requirements

aman9608 · November 11, 2024, 1:06pm

Hi All!
I’m trying to fine tune a LLama 3.2 1B instruct model, that has been quantized during loading. But for some reason, the trainer errors out stating:
“OutOfMemoryError: CUDA out of memory. Tried to allocate 32.00 GiB. GPU 0 has a total capacity of 6.00 GiB of which 3.48 GiB is free.”
I’m not sure why this is the case that training a relatively small model is requiring 32GB of VRAM.
Would really appreciate any help if possible, code attached below:

quantization_config = BitsAndBytesConfig(
        load_in_4bit=True,
        bnb_4bit_compute_dtype=torch.float16,
    )
tokenizer = AutoTokenizer.from_pretrained("meta-llama/Llama-3.2-1B-Instruct")
model = AutoModelForCausalLM.from_pretrained("meta-llama/Llama-3.2-1B-Instruct", num_labels = 23, quantization_config=quantization_config)
lora_config = LoraConfig(
    r=4,
    lora_alpha=8,
    lora_dropout=0.1,
    bias="none",
    task_type=TaskType.CAUSAL_LM,
    target_modules = ["q_proj", "k_proj", "v_proj"]
)
model = get_peft_model(model, lora_config)

training_args = TrainingArguments(
    output_dir='./results',          
    num_train_epochs=1,              
    per_device_train_batch_size=1,   
    per_device_eval_batch_size=16,   
    warmup_steps=500,                
    weight_decay=0.01,               
    logging_dir='./logs',            
    logging_steps=10,
    evaluation_strategy="epoch",     
    gradient_accumulation_steps=4,  
)

trainer = Trainer(
    model=model,                         
    args=training_args,                  
    train_dataset=train_dataset,         
    eval_dataset=eval_dataset,

BenjaminB · November 29, 2024, 3:14pm

With the given information, it’s hard to tell what the reason is for why so much memory is needed, there is nothing obviously wrong there. It would be helpful if you could share the full code and the full error message.

One common source of excessive memory usage can be in the data itself. If you have very long sequences, because of the quadratic memory requirement, OOMs can easily occur. You could check if setting a low max_seq_length helps to curb the memory usage.

DannyDias · January 23, 2025, 4:49pm

Running LLaMA 3.2 locally requires adequate computational resources. Below are the recommended specifications:

Hardware:

GPU: NVIDIA GPU with CUDA support (16GB VRAM or higher recommended).
RAM: At least 32GB (64GB for larger models).
Storage: Minimum 50GB of free disk space for the model and dependencies.
Software:

Operating System: Linux (preferred), macOS, or Windows.
Python: Version 3.8 or higher.
CUDA Toolkit: Required for GPU acceleration (11.6 or newer).

rkapuaala · June 16, 2025, 7:54am

Has anyone fixed this yet? I’m using a AMD Ryzen 5 4600G with Radeon Graphics on my ubuntu server.
I’ve got 64GB of ram, 2 asus Geforce rtx ti 16 GB 4060 gpus and 4 1T nvme ssds. I still keep getting those errors. I’ve tried configuring accelerate at the begining of the script, setting every environment variable I can find, and the CUCA still demands 32 GB.

John6666 · June 16, 2025, 2:52pm

As BenjaminB also mentioned, it seems unusual that this model size and training parameters require this amount of RAM.
Therefore, there may be issues with the library or model, and the cause may lie in parts other than the visible parameters.

However, if this were a common issue, there would likely be many reports of it. It may be specific to certain conditions, such as a multi-GPU environment, a specific version of the library, or specific hardware.

rkapuaala · June 16, 2025, 3:58pm

I’m wondering it maybe has something to do with using a local dataset instead of the hub, but that’s just a desperate conclusion after many hours of struggling with this issue. I’ll check out the link you provided hopefully that will provide some insight. Thanks.

John6666 · June 16, 2025, 4:08pm

I see, the dataset could also be a possible cause…
Well, the best practices for datasets are probably available in this forum or on GitHub if you search for them…

Also, depending on the model, gradient checking may not be available (I think it should be available in Llama 3.2 1B, though…), and there may still be some potential bugs in multi-GPU environments.

When trying to isolate the issue, it’s usually faster to temporarily switch to a smaller, simpler model or dataset.

github.com/huggingface/trl

CUDA OOM when using SFTTrainer with Phi-1.5B

opened 02:40PM - 25 Nov 23 UTC

closed 03:05PM - 07 Jan 24 UTC

SupreethRao99

Hi, I'm trying to supervised fine-tune a phi-1.5B model on a custom dataset w…ith the SFTTrainer, my script closely follows the [sft_llama2.py](https://github.com/huggingface/trl/blob/main/examples/research_projects/stack_llama_2/scripts/sft_llama2.py). I'm training the model on 4x2080Ti (11G), the model 1.3B params should comfortable fit it the combined VRAM of the GPU's but I see a CUDA OOM errors when I start my training. my hyper parameters are as follows: ```python per_device_train_batch_size: Optional[int] = field(default=1, metadata={"help": "The batch size per GPU."}) per_device_eval_batch_size: Optional[int] = field(default=1, metadata={"help": "The batch size per GPU for evaluation."}) gradient_accumulation_steps: Optional[int] = field(default=8, metadata={"help": "The number of gradient accumulation steps."}) gradient_checkpointing: Optional[bool] = field(default=False, metadata={"help": "Whether to use gradient checkpointing."}) ``` furthermore , this is how I'm instantiating my model: ```python self.base_model = AutoModelForCausalLM.from_pretrained( pretrained_model_name_or_path=self.script_args.model_name, quantization_config=self.bnb_config, device_map="auto",#{"": Accelerator().local_process_index}, trust_remote_code=True, # torch_dtype=torch.float16, # use_flash_attention_2=False ) ``` I cannot use PEFT or Gradient Checkpointing as Phi models are not supported.

Topic		Replies	Views
Memory requierements Models	2	391	February 18, 2025
Inquiry Regarding Out of Memory Issue During LoRA Fine-Tuning Models	2	119	May 5, 2025
CUDA out of memory on multi-GPU 🤗Transformers	1	2649	March 6, 2024
Fine tune Meta-Llama-3.1-8B OOM error after the 1st training step Models	0	162	September 6, 2024
Training CodeLlama2 using LORA doesnt save any memory Beginners	0	701	November 23, 2023

Fine Tuning LLama 3.2 1B Quantized Memory Requirements

Related topics