Hi all,
I have encountered, that DataCollatorForLanguageModeling has different behaviour regarding padding, when dicts or just tokens are passed to it.
-
If a sequence of Mapping is passed, then collator calls pad method without checking (inside pad_without_fast_tokenizer_warning function).
-
Otherwise, it calls _torch_collate_batch, who checks where padding is necessary.
So my question is whether this difference is intentional?
The reason why I am asking, is that when I do a LM-training, I already have packed sequences of tokens (so the padding is not required), but some tokenizers (LLaMA-3 for example) does not have the padding token, and the code fails in one case, but not in the other.