Which loss function is used on Whisper model?

I read the article on Whisper model:

Robust Speech Recognition via Large-Scale Weak Supervision

They didn’t write which loss function did they used ?

It seem that they trained the model as classification task, so did they used cross-entropy loss ?