I see that in T5 they do use scale_parameter
https://console.cloud.google.com/storage/browser/_details/t5-data/experiments/scaling/sc-bi_v1-bsx4/operative_config.gin
Parameters for AdafactorOptimizer:
==============================================================================
AdafactorOptimizer.beta1 = 0.0
AdafactorOptimizer.clipping_threshold = 1.0
AdafactorOptimizer.decay_rate = None
AdafactorOptimizer.epsilon1 = 1e-30
AdafactorOptimizer.epsilon2 = 0.001
AdafactorOptimizer.factored = True
AdafactorOptimizer.min_dim_size_to_factor = 128
AdafactorOptimizer.multiply_by_parameter_scale = True