Tokenizer.encode not returning encodings

astarostap · October 8, 2021, 1:25pm

How do you get token encodings? The method isn’t working in this case.

I need the token offsets in order to translate my labels to a normal list of token tags (my labels are in this format [{start_index: int, end_index: int, tag: str} … ]

Thank you!

Maybe it’s because this is a pre-trained fast tokenizer?

sgugger · October 8, 2021, 2:03pm

You need to use the tokenizer directly on your text, not the encode method:

tokenizer("hello world")

astarostap · October 9, 2021, 2:03am

Thank you Sylvain! That worked, I also had to pass return_offset_mapping=True

Topic		Replies	Views
BUGs on offset-mapping 🤗Tokenizers	0	171	May 24, 2024
Return_offsets_mapping when decoding 🤗Tokenizers	3	31	April 25, 2025
Offset mappings differ for tokenizers 🤗Tokenizers	0	1657	October 30, 2023
Does a tokenizer keep the mapping between my labels to their encoding? 🤗Tokenizers	3	2167	April 4, 2022
How to tokenize using map 🤗Datasets	4	6174	April 14, 2021

Tokenizer.encode not returning encodings

Related topics