Hello everyone ,
There is something im not sure about .I saw this class pipeline for inference and i wanted to know if it implements in some sort of way the following setup:
-the tokenizer is running on the CPU creating the batch for the model
-the model on the GPU is fed with those tokens that are moved to the GPU.
I wanted to know if i have to implement the thrading logic myself or is there a native HF way of doing that.Thanks