Does T5Tokenizer support the Greek language?

Does T5Tokenizer support the Greek language?

When I run the 3 lines of code below, then the input_ids are just 2 and 3 which correspond to the unknown token and the underscore respectively. This is the same for any input text of Greek letters.

from transformers import T5Tokenizer
tokenizer = T5Tokenizer.from_pretrained(“t5-small”)
input_ids = tokenizer(‘Γειά σου Κόσμε’, return_tensors=‘pt’).input_ids

Hi,

T5 itself was trained on English data only. However, there’s a multilingual variant called mT5 which supports Greek.

1 Like