In my application, I currently perform certain operations as part of logit processors to mask certain tokens. The computations I perform are expensive and it slows down the LLM generation.
My tool would be much faster if I could run this process in parallel with the LLM generation and could use the results of my parallel running process to mask the tokens. The logitprocessor in the transformers library does not give me enough access to modify this flow. what would be the best approach to address this issue?