How to apply TranslationPipeline from English to Brazilian Portuguese?

celsofranssa · August 31, 2021, 5:54pm

How to apply TranslationPipeline from English to Brazilian Portuguese?

I’ve tried the fowling approach with no success:

from transformers import pipeline

translator = pipeline(
    model="t5-small", 
    task="translation_en_to_br"
    )

translator("How old are you?", src_lang="en", tgt_lang="br")
# [{'translation_text': '         '}]

Could you give me some directions?

BramVanroy · August 31, 2021, 6:33pm

As far as I can tell, T5 has only been trained/finetuned on English, German, French, Romanian. You can read Section 3.1.3 in their paper. I am not aware of Brazilian Portuguese models. Also, I don’t think it has an official language code so “br” is not likely to work anyway.

Marcin · August 31, 2021, 7:39pm

I’ve checked the following, but it produces garbage:
pipeline('translation_en_to_br', model='Helsinki-NLP/opus-mt-en-mul')('>>br<<How old are you?')
[{'translation_text': '♫ Horatos edad tu?'}]

maybe you should try Narrativa/mbart-large?

celsofranssa · August 31, 2021, 8:32pm

Thank you @Marcin,

I had already tested it with Narrativa/mbart-large the which produces the following result:

!pip install transformers

translator = pipeline(
    model="Narrativa/mbart-large-50-finetuned-opus-en-pt-translation", 
    task="translation_en_to_pt"
    )


translator("How old are you?", src_lang="en", tgt_lang="pt")
# [{'translation_text': 'pt - - - - - - - - - - - - - - - - - - - - - - - - - - 
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -'}]

Marcin · August 31, 2021, 8:37pm

please use the code mentioned in the model:

from transformers import MBart50TokenizerFast, MBartForConditionalGeneration

ckpt = 'Narrativa/mbart-large-50-finetuned-opus-en-pt-translation'

tokenizer = MBart50TokenizerFast.from_pretrained(ckpt)
model = MBartForConditionalGeneration.from_pretrained(ckpt)

tokenizer.src_lang = 'en_XX'

def translate(text):
    inputs = tokenizer(text, return_tensors='pt')
    input_ids = inputs.input_ids
    attention_mask = inputs.attention_mask
    output = model.generate(input_ids, attention_mask=attention_mask, forced_bos_token_id=tokenizer.lang_code_to_id['pt_XX'])
    return tokenizer.decode(output[0], skip_special_tokens=True)

text = "How old are you?"
translation = translate(text)

print(f"text = {text}\ntranslation = {translation}")

it outputs:

text = How old are you?
translation = Quantos anos tens?

celsofranssa · August 31, 2021, 8:39pm

@BramVanroy,

I’ve tried different codes (pt, pt_br, pt_BR) based on Helsinki-NLP/opus-mt-en-ROMANCE model card.

celsofranssa · August 31, 2021, 8:49pm

Maybe it is a TranslationPipeline related issue.

The output is correct when using the following approach:

from transformers import MBart50TokenizerFast, MBartForConditionalGeneration

ckpt = 'Narrativa/mbart-large-50-finetuned-opus-en-pt-translation'

tokenizer = MBart50TokenizerFast.from_pretrained(ckpt)
model = MBartForConditionalGeneration.from_pretrained(ckpt).to("cuda")

tokenizer.src_lang = 'en_XX'

def translate(text):
    inputs = tokenizer(text, return_tensors='pt')
    input_ids = inputs.input_ids.to('cuda')
    attention_mask = inputs.attention_mask.to('cuda')
    output = model.generate(input_ids, attention_mask=attention_mask, forced_bos_token_id=tokenizer.lang_code_to_id['pt_XX'])
    return tokenizer.decode(output[0], skip_special_tokens=True)




translate("Who are you?")
#Quem és tu?

Topic		Replies	Views
MBART-50 looks not compatible with pipeline 🤗Transformers	0	69	July 10, 2024
Pipeline doest seem to work with mbart 🤗Transformers	0	288	May 4, 2022
Translating multiple languages to English (Tensorflow) - repost 🤗Transformers	1	750	December 20, 2021
[Feature Request] Is there an option for multiple target language in translation pipeline? 🤗Transformers	0	276	March 16, 2023
Translation Pipeline DE (German) to EN (English) not working 🤗Transformers	0	455	October 27, 2021

How to apply TranslationPipeline from English to Brazilian Portuguese?

Related topics