Speech to Text timestamp for words

Hey I’m using the HuBert Model to transcribe a WAV file (speech to text - facebook/hubert-large-ls960-ft · Hugging Face). Is there a way to also get the timestamps for every word?