Reformer Model /Fixed default num_bucket

Hi,

Following the PR https://github.com/huggingface/transformers/pull/4564, if the num_bucket hyperpameter is not provided, the num_bucket is computed to a good value as per the formula provided by the paper (see page 4 from https://arxiv.org/pdf/2001.04451.pdf).

However, from my understanding, the implementation in HuggingFace uses the sequence length from the 1st batch instead of using the maximum sequence length. This would result in lower num_bucket than expected by the formula from the paper.

Is there a reason to have a slightly different implementation than the paper ?

Please let me know if I’m missing something

Thanks
François