Does T5Tokenizer support the Greek language?

Does T5Tokenizer support the Greek language?

When I run the 3 lines of code below, then the input_ids are just 2 and 3 which correspond to the unknown token and the underscore respectively. This is the same for any input text of Greek letters.

from transformers import T5Tokenizer
tokenizer = T5Tokenizer.from_pretrained(ā€œt5-smallā€)
input_ids = tokenizer(ā€˜Ī“ĪµĪ¹Ī¬ ĻƒĪæĻ… ĪšĻŒĻƒĪ¼Īµā€™, return_tensors=ā€˜ptā€™).input_ids

Hi,

T5 itself was trained on English data only. However, thereā€™s a multilingual variant called mT5 which supports Greek.

1 Like