NLLB text embeddings

Hi,
I am currently trying to combine the text embeddings generated using the following model
model = AutoModelForSeq2SeqLM.from_pretrained(‘facebook/nllb-200-distilled-600M’)

I want to combine the the text embeddings with the acoustic embeddings of the audio but I am not sure how to do that. How can we get the embeddings of the last hidden state to get these embeddings of the encoder so that before giving these embeddings to the decoder I can combine them with acoustic embeddings.
Thanks in advance.