How to know the data format required by a model?


I’m trying to send a sound to a model for Speech To Text.
I’m using the Ilyes/wav2vec2-large-xlsr-53-french model and when i send a wav file it works well.

When printing the data, i see the wav file is converted to a bytestring before being sent to the model, so i transformed my own mic recording to a bytestring too.
But when i send a sound recorded from my mic i get a “Malformed soundfile” error.

Is there a way to know exactly what a model is waiting as an input format (bytestring, headers, etc) ?
Is there a standard that i should know of for sound data ?
Or is there a doc for each model ?
Or is there no doc and i should study the traning data to get the info ?

What’s the good practice to get some detailed info on the input data format for a given model ?


Well, i started the datasets tutorial, which answers my question^^