Stanford Alpaca on 2x3090

anonpoet · March 21, 2023, 10:10pm

So I am attempting to do the Stanford Alpaca training that they describe here:

I have a workstation with 512GB Ram and 2 x 3090s with 24GB VRAM. I have reached the point where I am trying to train it but I keep getting out of memory error. I know they used 4x A100 with 80GB VRAM. I tried changing the number of GPUs to 2 and reducing the batch sizes (see below for my torchrun command with args. I have also tried using bitsandbytes to reduce down to 8bits but I am having problems getting it to run.

Any suggestions?

torchrun --nproc_per_node=2 --master_port=13833 train.py
–model_name_or_path decapoda-research/llama-7b-hf
–data_path ./alpaca_data.json
–bf16 True
–output_dir /home/dsa/stanford_alpaca
–num_train_epochs 3
–per_device_train_batch_size 1
–per_device_eval_batch_size 1
–gradient_accumulation_steps 32
–evaluation_strategy “no”
–save_strategy “steps”
–save_steps 2000
–save_total_limit 1
–learning_rate 2e-5
–weight_decay 0.
–warmup_ratio 0.03
–lr_scheduler_type “cosine”
–logging_steps 1
–fsdp “full_shard auto_wrap”
–fsdp_transformer_layer_cls_to_wrap ‘LLaMADecoderLayer’
–tf32 True

Topic		Replies	Views
Out of memory training 3B param model on 8 GPU (320GB memory) with FSDP Intermediate	1	1691	July 28, 2023
Fine Tuning LLama 3.2 1B Quantized Memory Requirements Models	6	1420	June 16, 2025
CUDA out of memory on multi-GPU 🤗Transformers	1	2649	March 6, 2024
Running into OOM on GPU with quantized llama-3-8b for text generation inference Models	0	494	June 29, 2024
Llama 2 & 8K Training 🤗Transformers	0	726	August 4, 2023

Stanford Alpaca on 2x3090

Related topics