When decode a series of tokens from stream inference, how to avoid partial token?

gaoxt1983 · February 2, 2024, 3:16am

I want to implement a LLM inference server which holds a collection of huggingface models, but for stream inference, which return a token at a time. then token which returns may not enough to decode to a readable word. So what should I do to achieve such goal: only returns when token can be decode to a readable word?

please endulge my poor English…

Topic		Replies	Views
Deploying to Model Hub for Inference with custom tokenizer Beginners	1	623	January 1, 2022
Inference Model with API and Integrate to LM (Language Model) 🤗Transformers	0	636	June 7, 2022
Inference API for Tokenizers Beginners	0	239	November 17, 2022
Questions on model's tokens 🤗Tokenizers	0	601	March 24, 2021
Issue with Decoding in HuggingFace 🤗Tokenizers	2	3842	March 24, 2022

When decode a series of tokens from stream inference, how to avoid partial token?

Related topics