I am trying to use Speech2TextModel with a very small d_model configuration (i.e., 2) but it raises the ZeroDivisionError error. Here is the simplified program to reproduce
from transformers import Speech2TextModel, Speech2TextConfig
import torch
with torch.inference_mode():
cfg = Speech2TextConfig(d_model=2)
model = Speech2TextModel(cfg)
Error:
emb = math.log(10000) / (half_dim - 1)
ZeroDivisionError: float division by zero
Is this the expected behavior? How can I use the speech2text model with d_model=2
?