Printing tokens array

Mahmoodn · April 12, 2024, 1:06pm

Hi,
How can I see the tokens with tokenizer()? In the example:

raw_inputs = [
    "I've been waiting for a HuggingFace course my whole life.",
    "I hate this so much!",
]
inputs = tokenizer(raw_inputs, padding=True, truncation=True, return_tensors="pt")
print(inputs)

The inputs contains the unique numbers. I would like to see the tokens. For example, I want to know if ['I've'] is the token or ['i',''','ve'] is the token.

Topic		Replies	Views
Gpt2 token of specific string 🤗Transformers	0	295	March 30, 2023
Show Submodels of PegasusTokenizer 🤗Tokenizers	1	631	April 28, 2022
Meanings of different brackets during tokenization 🤗Transformers	0	307	December 10, 2021
Chunk tokens into desired chunk length without simply getting rid of rest of tokens 🤗Tokenizers	0	641	June 15, 2023
Index of wordpieces (subwords) after tokenization by transformers 🤗Tokenizers	0	699	August 28, 2021

Printing tokens array

Related topics