Masking task with BERT on time serires

Hi everone,
I started to pre-trained BERT for a masking task in a time series domain. I used a custom tokenizzation (not an usual model) to masking some samples with a special token. But durig the training process the loss is too much constant (like 6.7, I used the SparseCategoricalCrossentropy).
Could anyone help me?

Thank guys.

1 Like