Regarding Tokenizer

Tarun-1999M · July 5, 2024, 5:13pm

0

Hard time understanding the working of tokenizer

from transformers import AutoModelForSequenceClassification,AutoTokenizer #hugging face libraries tkz = AutoTokenizer.from_pretrained(model)

The function: def tkz_func(x): return tkz(x[‘input’]) works perfectly when we apply it to the datasets, returns updated dataset with input_ids, token_type_ids, attention_masks

When we apply it to the dataframe df.apply(tkz_func,axis=1) it just returns the list of row names for all the row values [input_ids,token_type_ids,attention_masks]

Why?

Topic		Replies	Views
How to return custom `token_type_ids` from a tokenizer? 🤗Datasets	0	306	May 2, 2023
How to return custom `token_type_ids` or other values from a tokenizer? 🤗Tokenizers	0	675	May 3, 2023
Programmatic way to Tokenization on Custom Text Columns 🤗Tokenizers	0	568	June 27, 2022
Cannot encode/tokenize my Dataset Dictionary Beginners	1	1075	August 19, 2021
How to tokenize using map 🤗Datasets	4	6189	April 14, 2021

Regarding Tokenizer

Related topics