How to batch a multiple sentences (4000+)

TomasAlessi · January 24, 2023, 3:03pm

I’m using finbert pretrained model for text classificantion, i have passed my 4000+ sentences through the tokenizer to create my batch, but when i pass this batch to the model to get my outputs it’s takes infinite time and can’t process, when i pass 2000-3000 sentences it’s takes time but the model can process. How can i pass 4000+ sentences ?

dblakely · January 30, 2023, 8:51pm

Is this for training or inference?

If training, you should follow the Hugginface training examples to learn how to use batching and a data loader.

If it’s for inference, you can write a small script that splits the 4000 sentences into smaller batches that each contain several dozen sentences. Iteratively feed those into your model and accumulate the outputs.

Topic		Replies	Views
How to set batchsize of inference Beginners	1	316	October 17, 2024
Question about maximum number of tokens Research	1	6172	February 9, 2021
Parallelize model call for TFBertModel 🤗Transformers	3	1031	January 7, 2021
Using Batch Encodings 🤗Transformers	0	684	July 12, 2022
How to use transformers for batch inference 🤗Transformers	1	28345	August 20, 2021

How to batch a multiple sentences (4000+)

Related topics