Help for using whisper with embeddings

I’m trying to use whisper to transcribe a wav file.
I’m drowning with the documentation and configurations.

  • I want to use base or medium model.
  • I want to be able to decide to run on CPU or GPU
  • For test.wav file I want to get it’s transcription
  • In addition to the transcription I want to get the encoder embeddings for this test.wav

Can you please help me and write the code for this ?

@laro1 go to anywhere on this? I’m on the same trajectory like u but not able to get encoder embeddings