Extracting output speech recognition features while chunking

Hi! I want to extract automatic speech recognition features (final hidden state), while chunking the audio also. AutomaticSpeechRecognitionPipeline allows sophisticated chunking (cutting at the silent parts I guess), but it doesn’t return the hidden states. Is there an easy way to do this without modifying the source code? Thanks in advance