I am using sentence transformers (i tried many multilang tokenizers) but they do not perform as well vs english.
For now I translare first using the deepl api and tokenize the translated version. Good results so far.
I was wondering what the downside of this method is? The translation is very fast and costs are manageable.