When I load a huge model like T5 xxl pretrained using device_map set to auto, and torch_dtype set to float16, it always insists on including my CPU when I have enough GPU ram (48 GB) how do I constrain accelerate to use only my GPUs? I tried setting the device_map manually with the layers spread over the GPUs but it gave an error.
Related topics
Topic | Replies | Views | Activity | |
---|---|---|---|---|
RuntimeError: Expected all tensors to be on the same device, but found at least two devices | 0 | 97 | November 30, 2024 | |
Move model with device_map="balanced" to CPU | 1 | 6269 | February 5, 2024 | |
Accelerate not spreading on multiple CPUs | 1 | 1805 | August 1, 2023 | |
Why am I out of GPU memory despite using device_map="auto"? | 3 | 18069 | March 18, 2024 | |
Load a large model to multipe, specific GPUs (without CUDA_VISIBLE_DEVICES) | 0 | 176 | November 22, 2024 |