Dealing with proper nouns in wav2vec2

Has anyone had any success fine-tuning wav2vec2 on multiple languages at the same time (bilingual model)?

The problem I’m dealing with is proper nouns. Often times names of politicians/celebrities are English names, so a system trained on say, German won’t work well with those unless they appear in the training dataset. I notice that ASR systems from Google/Microsoft handle this case just fine, so I’d appreciate if somehow has insight how this is done.

My instinct was to train on a German + English dataset, but I think the performance of such a system would be bad.