Is attention_mask implemented correctly in BERT?

I was browsing through the Bert model code and noticed that the attention_mask is implemented as a simple addition:

This looks strange to me because in the original implementation they map 0’s to -10000 (pre-softmax):

I searched through the file but couldn’t find an equivalent mapping so it kind of looks like it’s just adding the attention mask to the logits. Am I missing something?