Hi, everyone.
I’m currently attempting to fine-tune meta-llama/Meta-Llama-3-8B-Instruct model using QLoRA. I successfully completed the fine-tuning using an Amazon SageMaker Training Job on a ml.g5.8xlarge instance.
However, when I ran the same code on a ml.p4d.24xlarge instance outside of a Training Job, I encountered a CUDA: out of memory error and was unable to complete the fine-tuning. Is it possible that SageMaker Training Jobs significantly reduce VRAM usage? Additionally, how much VRAM is required to fine-tune meta-llama/Meta-Llama-3-8B-Instruct mode using QLoRA with 4-bit or 8-bit quantization?
I would greatly appreciate any help you can provide.
< model & LoRA config >
model = AutoModelForCausalLM.from_pretrained(
args.model_id,
use_cache=False if args.gradient_checkpointing else True,
device_map="auto",
load_in_8bit=True,
)
peft_config = LoraConfig(
task_type=TaskType.CAUSAL_LM,
inference_mode=False,
r=16,
lora_alpha=32,
lora_dropout=0.05,
target_modules=[
"q_proj",
"up_proj",
"o_proj",
"k_proj",
"down_proj",
"gate_proj",
"v_proj"
]
)
model = prepare_model_for_int8_training(model)
model = get_peft_model(model, peft_config)