Question about all_head_size under BertSelfAttention

While going through the codebase , I found the following code under the BertSelfAttention class here - (BertSelfAttention-

self.num_attention_heads = config.num_attention_heads
self.attention_head_size = int(config.hidden_size / config.num_attention_heads)
self.all_head_size = self.num_attention_heads * self.attention_head_size

I am not sure I understand the reason for calculating all_head_size again instead of assigning it the value of config.hidden_size directly.

Am I missing something?

I also see a check right above this that ensures that the hidden_size is divisible by num_attention_heads:

    if config.hidden_size % config.num_attention_heads != 0 and not hasattr(config, "embedding_size"):
        raise ValueError(
            "The hidden size (%d) is not a multiple of the number of attention "
            "heads (%d)" % (config.hidden_size, config.num_attention_heads)