How to change head in Hubert/Wav2Vec2 for downstream video classification

Hi every one. Since I am trying to shift to using hugging face. I am having some tough time. Can anybody please help how to add a new classification head to the pre-trained models such as Wav2Vec2 for downstream other classifications. Do we need to modify the loss as well?
Any detailed tutorial please on this. Can someone provide some small code here please
Thank you