Map() function doesn't process

shiqiw86 · April 14, 2022, 4:51am

I am trying map() function with the following code

dataset = load_dataset('csv', data_files={'train': 'train.csv', 'validation': 'dev.csv', 'test': 'test.csv'}, column_names=['sentence1', 'sentence2', 'label'])
tokenizer = AutoTokenizer.from_pretrained('roberta-large')

def tokenize_function(samples):
    return tokenizer(samples['sentence1'], samples['sentence2'], padding=True, truncation='True', max_length=256)

tokenized_datasets = dataset.map(tokenize_function, batched=True)

and the program seems to be stuck at this point.

mariosasko · April 21, 2022, 12:01pm

Hi! Can you interrupt the process (CTRL+C or Command-C) after it gets stuck and share the (entire) trackback?

shiqiw86 · April 21, 2022, 8:38pm

Thanks for the reply. I accidentlly put quotation mark around True in the function above, and as long as I fixed it, everything works just fine.

Topic		Replies	Views
Dataset.map hangs on tokenization (relatively small dataset) 🤗Datasets	2	1977	April 22, 2022
Huggingface Dataset.map shows red progress bar when batched=True 🤗Datasets	1	1174	October 24, 2022
Tokenizer is not defined 🤗Transformers	5	11172	March 19, 2024
Saving outcomes if Error while applying map function on dataset 🤗Datasets	2	1127	February 14, 2023
Map function skipping rows (only 8k out of 1.6M rows) 🤗Datasets	1	195	December 25, 2023

Map() function doesn't process

Related topics