Speech detection in records from SIP-telephony with G.729 codec

I need to implement the process of additional training of a multilingual speech recognition model in such a way as to improve the quality of recognition on data received via SIP telephony (G.729 codec). Is it a good idea to use xlsr-wav2vec2 for this task after decoding my records to .wav or there is a model that trained in G.729 codec records?