Hi, it seems that the Reformer implementation defaults to six layers (via the attn_layers
option)? This seems rather low? Is there a good rationale behind this or has anyone some experience in whether adding more local
, lsh
layer pairs is worth doing?