Pipelines without a tokenizer

brosand · August 17, 2021, 4:46pm

Hello,

I am trying to use a pipeline for zero-shot classification, and due to my need to pretokenize the data in order to split large text, I have tokenized data I would like to feed into the pipeline. Is there any way to do this? I don’t see any options to run a pipeline without a tokenizer.

Is my only method to define some new tokenizer that doesn’t actually do anything?

Thanks

Illuminarchie · February 19, 2024, 12:27am

Hi,

Hate to bring this back up, but did you ever find a solution to this? I was thinking of trying to make a custom tokenizer which doesn’t do anything, but I was wondering if you found any good solution ! Thank you!

Topic		Replies	Views
How to use pipeline for 'token-classification' with already tokenized input? Beginners	0	690	February 3, 2022
Zero-Shot Classification Pipeline - Truncating Beginners	4	1158	May 27, 2021
Speeding up zero shot classification [Solved] Beginners	5	6029	September 9, 2020
Questions re: Tokenizer pipeline composability / reuse outside of the HF ecosystem 🤗Tokenizers	0	213	December 18, 2023
Pipeline's Tokenizer vs training tokenizer Beginners	1	440	March 8, 2021

Pipelines without a tokenizer

Related topics