RuntimeError with Mixed Precision during LoRA Fine-Tuning in LLAVA on Small GPU Machine

Hi everyone,

Iā€™m facing an issue while fine-tuning the LLAVA model using LoRA on a machine with limited GPU resources. To accommodate the small GPU, Iā€™ve been experimenting with 4-bit precision. However, I consistently encounter the following error:

RuntimeError: expected scalar type BFloat16 but found Float
This occurs specifically in the vision model, particularly during the LayerNorm operation in the forward pass.

Key Configuration:

  • Model: liuhaotian/llava-v1.6-vicuna-7b
  • Vision Tower: openai/clip-vit-large-patch14-336
  • LoRA: Enabled with lora_r=128, lora_alpha=256
  • Precision: 4-bit (bits=4)
  • Other Settings: bf16=True, gradient_checkpointing=True

Problem:

Iā€™m running into a data type mismatch where some layers (e.g., LayerNorm) expect BFloat16, but are instead using Float32, which triggers the error. When I inspect the model, I find a mix of data types across the layers:

  • 166 layers in float32
  • 744 layers in bfloat16
  • 369 layers in uint8

My Situation:

Iā€™m trying to modify LLAVA for my own use case and need to run it in a ā€œdebug modeā€ to test and tweak the code. Since I have limited GPU resources, Iā€™m using low precision (4-bit) to make debugging feasible. However, this data type mismatch is proving to be a roadblock.

My Questions:

  • How can I debug or fine-tune LLAVA with LoRA on a small GPU without running into these precision-related errors?
  • Should I be manually converting specific layers to avoid the mismatch between bfloat16 and float32?
  • Is there a general approach to running LoRA fine-tuning in a lightweight ā€œdebug modeā€ for code experimentation without worrying about outputs or precision mismatches?

Any guidance or suggestions would be greatly appreciated!

Thanks in advance!

1 Like

It seems like torchā€™s autocast is doing something bad or CUDA version mismatch is the most common cause.