Does T5 truncate input longer than 512 internally?

Thank you! I also wonder if TPU training can also support this “group by length” trick? The doc says TPU does not support dynamic shapes and I guess when each batch has different sequence length dimension that counts as dynamic.