### System Info
- `transformers` version: 4.33.3
- Platform: Linux-5.10.186-…179.751.amzn2.x86_64-x86_64-with-glibc2.10
- Python version: 3.8.17
- Huggingface_hub version: 0.17.3
- Safetensors version: 0.3.3
- Accelerate version: 0.23.0
- Accelerate config: not found
- PyTorch version (GPU?): 2.0.1+cu117 (True)
- Tensorflow version (GPU?): not installed (NA)
- Flax version (CPU?/GPU?/TPU?): not installed (NA)
- Jax version: not installed
- JaxLib version: not installed
- Using GPU in script?: A100
- Using distributed or parallel set-up in script?: torchrun --nproc-per-node 2 script.py
### Who can help?
@muellerzr, @pacman100
### Information
- [ ] The official example scripts
- [X] My own modified scripts
### Tasks
- [ ] An officially supported task in the `examples` folder (such as GLUE/SQuAD, ...)
- [X] My own task or dataset (give details below)
### Reproduction
```
import torch
from torch.utils.data import IterableDataset
from transformers import (
AutoModelForMaskedLM,
AutoTokenizer,
DataCollatorForLanguageModeling,
Trainer,
TrainingArguments,
)
data = [
{
"input_ids": torch.tensor([101, 2040, 2001, 1999, 14936, 102]),
"token_type_ids": torch.tensor([0, 0, 0, 0, 0, 0]),
"attention_mask": torch.tensor([1, 1, 1, 1, 1, 1]),
},
{
"input_ids": torch.tensor([101, 2040, 102]),
"token_type_ids": torch.tensor([0, 0, 0]),
"attention_mask": torch.tensor([1, 1, 1]),
},
{
"input_ids": torch.tensor([101, 2040, 2001, 1999]),
"token_type_ids": torch.tensor([0, 0, 0, 0]),
"attention_mask": torch.tensor([1, 1, 1, 1]),
},
{
"input_ids": torch.tensor([101, 2040, 2001, 1999, 14936, 102]),
"token_type_ids": torch.tensor([0, 0, 0, 0, 0, 0]),
"attention_mask": torch.tensor([1, 1, 1, 1, 1, 1]),
},
{
"input_ids": torch.tensor([101]),
"token_type_ids": torch.tensor([00]),
"attention_mask": torch.tensor([1]),
},
{
"input_ids": torch.tensor([101]),
"token_type_ids": torch.tensor([00]),
"attention_mask": torch.tensor([1]),
},
]
class ExampleDataset(IterableDataset):
def __init__(self, data):
super().__init__()
self.data = data * 20
def __iter__(self):
for x in self.data:
yield x
def __len__(self):
return len(self.data)
tokenizer = AutoTokenizer.from_pretrained("bert-base-cased")
model = AutoModelForMaskedLM.from_pretrained("bert-base-cased")
train_args = TrainingArguments(
output_dir="output",
num_train_epochs=3,
per_device_train_batch_size=2,
)
dc = DataCollatorForLanguageModeling(tokenizer=tokenizer)
trainer = Trainer(
train_dataset=ExampleDataset(data),
model=model,
args=train_args,
data_collator=dc,
)
trainer.train()
```
I run the above script with the command `torchrun --nproc-per-node 2 script.py`. This results in the following error.
```
Traceback (most recent call last):
File "fm_model/data/scratch.py", line 242, in <module>
trainer.train()
File "/opt/conda/envs/fmmodel/lib/python3.8/site-packages/transformers/trainer.py", line 1556, in train
return inner_training_loop(
File "/opt/conda/envs/fmmodel/lib/python3.8/site-packages/transformers/trainer.py", line 1816, in _inner_training_loop
for step, inputs in enumerate(epoch_iterator):
File "/opt/conda/envs/fmmodel/lib/python3.8/site-packages/accelerate/data_loader.py", line 597, in __iter__
next_batch, next_batch_info = self._fetch_batches(main_iterator)
File "/opt/conda/envs/fmmodel/lib/python3.8/site-packages/accelerate/data_loader.py", line 528, in _fetch_batches
batch = concatenate(batches, dim=0)
File "/opt/conda/envs/fmmodel/lib/python3.8/site-packages/accelerate/utils/operations.py", line 496, in concatenate
return type(data[0])({k: concatenate([d[k] for d in data], dim=dim) for k in data[0].keys()})
File "/opt/conda/envs/fmmodel/lib/python3.8/site-packages/accelerate/utils/operations.py", line 496, in <dictcomp>
return type(data[0])({k: concatenate([d[k] for d in data], dim=dim) for k in data[0].keys()})
File "/opt/conda/envs/fmmodel/lib/python3.8/site-packages/accelerate/utils/operations.py", line 499, in concatenate
return torch.cat(data, dim=dim)
RuntimeError: Sizes of tensors must match except in dimension 0. Expected size 1 but got size 6 for tensor number 1 in the list.
```
This is due to the fact that in `Trainer` there are no arguments that can be passed to prepare the dataloader with [split_batches](https://github.com/huggingface/accelerate/blob/48d96319e0033fb8c8979072d97edf3995639029/src/accelerate/data_loader.py#L515) so this errors out when running this [line](https://github.com/huggingface/accelerate/blob/69e4c3c54da3201eda288b500d138761e7a5221c/src/accelerate/data_loader.py#L481). This occurs since there is no padding done across batches before these are concatenated together.
In order to be able to use an iterable dataset with Trainer, something probably needs to be changed in accelerate or the Trainer to enable distributed dataloading when the batches end up being different lengths.
### Expected behavior
1. Automatic padding in accelerate when the batches produced have different lengths
OR
2. A way to specify split_batches where a full batch is produced then split for all the different processes