Vall-e and Vall-e X implementation

Is there a plan to implement Vall-e and Vall-e X into the transformers library, like was done for the SpeechT5 model?

I’d be happy to contribute, if anyone from the HF team wants we can have a chat and see how can this come to life in HF.