Tokenizer.encode not returning encodings

How do you get token encodings? The method isn’t working in this case.

I need the token offsets in order to translate my labels to a normal list of token tags (my labels are in this format [{start_index: int, end_index: int, tag: str} … ]

Thank you!

Maybe it’s because this is a pre-trained fast tokenizer?

image

You need to use the tokenizer directly on your text, not the encode method:

tokenizer("hello world")

Thank you Sylvain! That worked, I also had to pass return_offset_mapping=True

1 Like