Getting RuntimeError: expected scalar type Half but found Float in AWS P3 instances

Hi All,

I have a simple code which takes a opt6.7B model and fine tunes it. When I run this code in Google colab(Tesla T4, 16GB) it runs without any problem. But when I try to run the the same code in AWS p3-2xlarge environment (Tesla V100 GPU, 16GB) it gives the error.

RuntimeError: expected scalar type Half but found Float

To be able to run the fine tuning on a single GPU I use LORA and peft. which are installed exactly the same way (pip install) in both cases. I can use with torch.autocast("cuda"): and then that error vanishes. But the loss of the training becomes very strange meaning it does not gradually decrease rather it fluctuates within a large range (0-5) (and if I change the model to GPT-J then the loss always stays 0) whereas the loss is gradually decreasing for the case of colab. So I am not sure if using with torch.autocast("cuda"): is a good thing or not.

The transfromeers version is 4.28.0.dev0 in both case. Torch version for colab shows 1.13.1+cu116 whereas for p3 shows - 1.13.1 (does this mean it does not have CUDA support? I doubt, on top of that doing torch.cuda.is_available() shows True)

The only large difference I can see is that for colab, bitsandbytes has this following setup log

===================================BUG REPORT===================================
Welcome to bitsandbytes. For bug reports, please submit your error trace to: https://github.com/TimDettmers/bitsandbytes/issues
================================================================================
CUDA_SETUP: WARNING! libcudart.so not found in any environmental path. Searching /usr/local/cuda/lib64...
CUDA SETUP: CUDA runtime path found: /usr/local/cuda/lib64/libcudart.so
CUDA SETUP: Highest compute capability among GPUs detected: 7.5
CUDA SETUP: Detected CUDA version 118

Whereas for p3 it is the following

===================================BUG REPORT===================================
Welcome to bitsandbytes. For bug reports, please submit your error trace to: https://github.com/TimDettmers/bitsandbytes/issues
================================================================================
CUDA SETUP: CUDA runtime path found: /opt/conda/envs/pytorch/lib/libcudart.so
CUDA SETUP: Highest compute capability among GPUs detected: 7.0
CUDA SETUP: Detected CUDA version 117
CUDA SETUP: Loading binary /opt/conda/envs/pytorch/lib/python3.9/site-packages/bitsandbytes/libbitsandbytes_cuda117_nocublaslt.so...

What am I missing? I am not posting the code here. But it is really a very basic version that takes opt-6.7b and fine tunes it on alpaca dataset using LORA and peft.

Why does it run in colab but not in p3? Any help is welcome :slight_smile:

(Sorry if this is a very novice question, well, that is why I am posting in beginners section)

Hey i just ran into this.

I first tried a A100GPU and worked fine, the used a V100 GPU and this error came up. Back tracked what i changed and it worked fine going back to a better gpu. Hope this helps