Troubles with features in .prepare()

ozoloev · November 29, 2024, 5:14pm

I just have a torch dataloader and do multi gpu inference with accelerate. In loader I have a 3 fields: input_ids, attention_mask and user_ids, so, after get_inference_dataset this features still in loader, but after .prepare, I don’t have user_ids filed, only ids and mask.

inference_loader = get_inference_dataset(config, tokenizer)
print(next(iter(inference_loader))[“user_ids”])
inference_loader = accelerator.prepare_data_loader(inference_loader)
print(next(iter(inference_loader))[“user_ids”])

John6666 · November 30, 2024, 1:52am

This seems to happen because of drop_last=True. I don’t know if this is the case.

github.com/huggingface/accelerate

about dataloader through prepare()

opened 06:47AM - 09 Jan 24 UTC

closed 03:06PM - 17 Apr 24 UTC

shliu0

In [tutorial](https://huggingface.co/docs/accelerate/quicktour), it is mentioned… that `Some data at the end of the dataset may be duplicated so the batch can be divided equally among all workers.` So if my train dataset size cannot divided by the #gpus, dataloader after prepare() will include duplicated data during training? Won't this property affect model performance(loss etc.) since it includes more data in train dataset? If it will cause a large difference, is there any method to not include these duplicated data in calculating loss?

Topic		Replies	Views
Data Parallel Multi GPU Inference 🤗Accelerate	9	4672	September 15, 2023
Accelerator .prepare() replaces custom DataLoader Sampler 🤗Accelerate	5	1292	March 9, 2025
How to ensure fast inference on both CPU and GPU with BertForSequenceClassification? Beginners	5	5831	November 3, 2021
How to Ensure Each Process Reads Its Own Dataset and Trains Correctly When Using Trainer？ 🤗Transformers	0	15	December 20, 2024
How to ensure GPU utilisation when preprocessing huggingface datasets Beginners	1	731	April 27, 2024

Troubles with features in .prepare()

Related topics