Speech2TextModel does not support small d_model

I am trying to use Speech2TextModel with a very small d_model configuration (i.e., 2) but it raises the ZeroDivisionError error. Here is the simplified program to reproduce

from transformers import Speech2TextModel, Speech2TextConfig
import torch
with torch.inference_mode():
    cfg = Speech2TextConfig(d_model=2)
    model = Speech2TextModel(cfg)

Error:

emb = math.log(10000) / (half_dim - 1)
ZeroDivisionError: float division by zero

Is this the expected behavior? How can I use the speech2text model with d_model=2?