Intermediate features from a Huggingface pretrained model

Hi everyone, I am new to using the Huggingface library. I want to use the TimeSformer or the VideoMAE pretrained model for my research. I want the output from the e.g., 3rd layer of a pretrained model.
So can I do it as follows?

model = TimesformerModel.from_pretrained("facebook/timesformer-base-finetuned-k400")
outputs = model(**inputs)
final_out = outputs.hidden_states[2]

Is that the correct way? Or am I missing any other details?
Also, where can I find the list of the pretrained checkpoints? e.g., I want to use the k600 models

Thanks a lot!