Fine tuning whisper on custom dataset

I’m trying to fine tune whisper large v2 on a custom dataset of over 7000 hours of speech. However, the audio files are very long since they are recordings of news reports, radio broadcasts, conferences… etc. I think on average most files are over an hour long
is it possible or do I have to split them to 30 seconds? and if I have to, please advise me on an efficient way to process that much data quick…
I have access to a server with 8x A100 GPUs, so memory shouldn’t be a problem