Train_test_split issue

Santhosh1508 · October 21, 2024, 4:18pm

Hi , im working with a dataset containing python codes,I tokenized it for Seq2Seq model tokenizer.after the preprocessing when I split the data the new dataset input_ids are reduced to special_tokens.Please help me resolve the issue

Topic		Replies	Views
Cannot encode/tokenize my Dataset Dictionary Beginners	1	1078	August 19, 2021
Custom Tokenizing? 🤗Tokenizers	0	240	March 19, 2024
Finetuning T5 on Squad 🤗Transformers	1	574	November 29, 2023
Preprocessing of dataset 🤗Tokenizers	0	172	April 10, 2024
Issue with Extracting Word Ids from Batch Encoding Object Beginners	2	1013	November 1, 2022

Train_test_split issue

Related topics