Hi , im working with a dataset containing python codes,I tokenized it for Seq2Seq model tokenizer.after the preprocessing when I split the data the new dataset input_ids are reduced to special_tokens.Please help me resolve the issue
1 Like