Hi everyone,
Iām facing an issue while fine-tuning the LLAVA model using LoRA on a machine with limited GPU resources. To accommodate the small GPU, Iāve been experimenting with 4-bit precision. However, I consistently encounter the following error:
RuntimeError: expected scalar type BFloat16 but found Float
This occurs specifically in the vision model, particularly during the LayerNorm operation in the forward pass.
Key Configuration:
- Model:
liuhaotian/llava-v1.6-vicuna-7b
- Vision Tower:
openai/clip-vit-large-patch14-336
- LoRA: Enabled with
lora_r=128
,lora_alpha=256
- Precision: 4-bit (
bits=4
) - Other Settings:
bf16=True
,gradient_checkpointing=True
Problem:
Iām running into a data type mismatch where some layers (e.g., LayerNorm) expect BFloat16, but are instead using Float32, which triggers the error. When I inspect the model, I find a mix of data types across the layers:
- 166 layers in float32
- 744 layers in bfloat16
- 369 layers in uint8
My Situation:
Iām trying to modify LLAVA for my own use case and need to run it in a ādebug modeā to test and tweak the code. Since I have limited GPU resources, Iām using low precision (4-bit) to make debugging feasible. However, this data type mismatch is proving to be a roadblock.
My Questions:
- How can I debug or fine-tune LLAVA with LoRA on a small GPU without running into these precision-related errors?
- Should I be manually converting specific layers to avoid the mismatch between bfloat16 and float32?
- Is there a general approach to running LoRA fine-tuning in a lightweight ādebug modeā for code experimentation without worrying about outputs or precision mismatches?
Any guidance or suggestions would be greatly appreciated!
Thanks in advance!