Thank you! I also wonder if TPU training can also support this “group by length” trick? The doc says TPU does not support dynamic shapes and I guess when each batch has different sequence length dimension that counts as dynamic.
Thank you! I also wonder if TPU training can also support this “group by length” trick? The doc says TPU does not support dynamic shapes and I guess when each batch has different sequence length dimension that counts as dynamic.