Pipelines without a tokenizer

Hello,

I am trying to use a pipeline for zero-shot classification, and due to my need to pretokenize the data in order to split large text, I have tokenized data I would like to feed into the pipeline. Is there any way to do this? I don’t see any options to run a pipeline without a tokenizer.

Is my only method to define some new tokenizer that doesn’t actually do anything?

Thanks

Hi,

Hate to bring this back up, but did you ever find a solution to this? I was thinking of trying to make a custom tokenizer which doesn’t do anything, but I was wondering if you found any good solution ! Thank you!