Reformer Model /Fixed default num_bucket

cccwam · October 30, 2020, 5:15pm

Hi,

Following the PR https://github.com/huggingface/transformers/pull/4564, if the num_bucket hyperpameter is not provided, the num_bucket is computed to a good value as per the formula provided by the paper (see page 4 from https://arxiv.org/pdf/2001.04451.pdf).

However, from my understanding, the implementation in HuggingFace uses the sequence length from the 1st batch instead of using the maximum sequence length. This would result in lower num_bucket than expected by the formula from the paper.

Is there a reason to have a slightly different implementation than the paper ?

Please let me know if I’m missing something

Thanks
François

Topic		Replies	Views
Introduction to Pagination 🤗Hub	5	801	February 24, 2023
Inconsistent Model/Pipeline Behavior using Automodel/Pipeline/BartForConditionalGeneration 🤗Transformers	3	882	February 16, 2021
Seq-2-Seq Predictions for Longer Sequences and Question for compute metrics function Beginners	0	454	December 16, 2021
ClientErro:400 when using batch transformer for inference Amazon SageMaker	11	2220	January 13, 2022
Optimization strategie 🤗Transformers	0	267	October 21, 2022

Reformer Model /Fixed default num_bucket

Related topics