BitsAndBytes With DDP

az10029 · October 5, 2024, 5:55pm

Hi All, Trying to use gemma with BitsAndBytes config for quanitzation. Then further wrapping in pytorch lighning module, and inferencing with ddp strategy continiously getting error of:
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:1 and cuda:0!
Same issue been experienced with LLamaFroConditionalGeneration. For Reference, not passing quntization_config on initialization as(.from_pretrained) rest of code is exactly same, everything works correctly.

Code for reference:

bnb_config = BitsAndBytesConfig(
        load_in_4bit=True,
        bnb_4bit_quant_type="nf4",
        bnb_4bit_compute_type=torch.bfloat16
)

gemma = PaliGemmaForConditionalGeneration.from_pretrained(FINETUNED_MODEL_ID, 
                                                          quantization_config=bnb_config
                                                          )

model = gem(model = gemma, processor = processor, config = config)

trainer = Trainer(
    accelerator='gpu',
    devices=7,
    strategy='ddp', 
    # fast_dev_run=True,
)
predictions = trainer.predict(model, dataloaders=dm.val_dataloader())

gem is just pytorch lighning wrapper, that works!!! without bits and bytes config.

Please let me know if anyone experiencing issue with this as well

John6666 · October 6, 2024, 12:41am

maybe.

model = gem(model = gemma, processor = processor, config = config)
model.to("cuda")

az10029 · October 7, 2024, 12:09pm

Same error, tried with, both:
gemma =

PaliGemmaForConditionalGeneration.from_pretrained(FINETUNED_MODEL_ID, 
                                                          quantization_config=bnb_config,
                                                            device_map = 'cpu'
                                                          )

RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:1 and cpu! (when checking argument for argument index in method wrapper_CUDA__index_select)

John6666 · October 7, 2024, 1:20pm

How about this. Anyway, as long as all the models and model components are .to(“cuda”) before the function that is giving the error, it should be fine. If you have LoRA or something outside of this code, you can send that to CUDA as well.

bnb_config = BitsAndBytesConfig(
        load_in_4bit=True,
        bnb_4bit_quant_type="nf4",
        bnb_4bit_compute_type=torch.bfloat16
)

gemma = PaliGemmaForConditionalGeneration.from_pretrained(FINETUNED_MODEL_ID, 
                                                          quantization_config=bnb_config,
                                                          device_map="auto"
                                                          ).to("cuda")

model = gem(model = gemma, processor = processor, config = config)

trainer = Trainer(
    accelerator='gpu',
    devices=7,
    strategy='ddp', 
    # fast_dev_run=True,
)
predictions = trainer.predict(model, dataloaders=dm.val_dataloader())

Topic		Replies	Views
An error i ve been trying to fix for days now Intermediate	4	434	November 19, 2024
BitsAndBytesConfig is not compitable in TPU env 🤗Transformers	2	237	July 6, 2024
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! I am on a single T4 GPU 🤗Accelerate	6	1166	June 10, 2024
Tensor not on the same device after using BitsandBytesConfig 🤗Transformers	0	104	June 27, 2024
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:2 and cuda:0! (when checking argument for argument index in method wrapper_CUDA__index_select) DeepSpeed	5	3454	August 26, 2024

BitsAndBytes With DDP

Related topics