Asynchronous CPU-GPU computation

chgrdj · March 15, 2024, 10:21am

Hello everyone ,

There is something im not sure about .I saw this class pipeline for inference and i wanted to know if it implements in some sort of way the following setup:
-the tokenizer is running on the CPU creating the batch for the model
-the model on the GPU is fed with those tokens that are moved to the GPU.
I wanted to know if i have to implement the thrading logic myself or is there a native HF way of doing that.Thanks

Topic		Replies	Views
Data Parallelism for multi-GPUs Inference Intermediate	0	548	October 26, 2022
Pipeline inference with multi gpus Beginners	0	610	March 13, 2022
Is there any way to avoid CPU bottlenecks when doing single prompt inference? Intermediate	1	972	June 12, 2023
Loading a HF Model in Multiple GPUs and Run Inferences in those GPUs 🤗Accelerate	10	9605	October 16, 2024
Containerizing Huggingface Transformers for GPU inference with Docker and FastAPI 🤗Transformers	0	2968	October 5, 2021

Asynchronous CPU-GPU computation

Related topics