HF transformers run a process parallel to LLM generation

shubhamugare · February 10, 2024, 2:03am

In my application, I currently perform certain operations as part of logit processors to mask certain tokens. The computations I perform are expensive and it slows down the LLM generation.

My tool would be much faster if I could run this process in parallel with the LLM generation and could use the results of my parallel running process to mask the tokens. The logitprocessor in the transformers library does not give me enough access to modify this flow. what would be the best approach to address this issue?

Topic		Replies	Views
Beginner Question - How to distill a LLM Beginners	0	561	November 17, 2023
Which Transformers/Libraries Should I use? Beginners	2	227	December 17, 2024
Inference speed between pipelines and Heads 🤗Transformers	0	311	April 3, 2023
Hugging Face Llama-2 (7b) taking too much time while inferencing Models	1	1496	June 23, 2024
How big are differences between transformer implementations Intermediate	0	533	April 26, 2022

HF transformers run a process parallel to LLM generation

Related topics