Easy way to implement annealing temperature softmax

Hi friends,

I wonder if there’s a easy way to implement annealing temperature softmax:

For example, I want to change 314 in modeling_bert.py from

attention_probs = nn.Softmax(dim=-1)(attention_scores)

to

attention_probs = nn.Softmax(dim=-1)(attention_scores/temp)

where “temp” is a variable decaying as the training process goes on according to a scheduler like the learning rate.

Thank you!

You can just do that change :slight_smile:
Model files are kept completely independent from each other just so you can easily tweak them for experiments like this.