Hi friends,
I wonder if there’s a easy way to implement annealing temperature softmax:
For example, I want to change 314 in modeling_bert.py from
attention_probs = nn.Softmax(dim=-1)(attention_scores)
to
attention_probs = nn.Softmax(dim=-1)(attention_scores/temp)
where “temp” is a variable decaying as the training process goes on according to a scheduler like the learning rate.
Thank you!