Is_split_into_words and using a pipeline

I trained a Distilbert using text already split into words and in sentences, so the tokenizer was executed using is_split_into_words parameter as True.

When using a pipeline, do I have to pass the input already split into words as I did during training? Is this required or can I send the text just sentencized?

1 Like