Not able to overfit a transformer model on my data

I have a very bursty timeseries for the forecasting purpose.
purposely I am trying to overfit on a small subset of data. I took 30 sample trajectories of length 150 to train the transformer model on. I am using a transformer with more than 10Million parameters, however the model doesn’t overrfit.
in each trajectory the first 75 points will be shown to the model and the model should predict the next 75 steps (75 step ahead forecasting). I am using adam optimizer and mse loss
Do you have any idea why I can’t overfit?