Streaming token output from models like T5

zhuda · January 31, 2023, 3:09am

Hey, I’m looking to stream tokens out of text2textgeneration models (more specifically those from the T5 family). Similar to what you see in OpenAI playground when requesting generations.

For text-generation models I’ve been able to emulate this (since we’re just predicting current token from the ones before it) by setting a max_time parameter and appending model generations to the prompt to continuously call until I’ve reached the number of tokens I wanted to generate. But I can’t do the same for text2text models.

Any advice on how I could stream output from models like T5? Is it even possible with the architecture?

zhuda · February 3, 2023, 10:46pm

Got a solution working, in generate() for the different types of sampling for example greedy_search() there is a next_token variable you can incrementally get the subsequent tokens generated by the model as soon as they are done. You’ll have to decode it yourself and encode the special rules you’d get from decode() but it works well. Monkey patched it with a new greedy_search() yielding next_token on each generation. Hope this helps for anyone looking to do the same.

peakji · March 7, 2023, 2:42pm

Inspired by the solution form @zhuda , I made a streaming generation service for Hugging Face transformers that is fully compatible with the OpenAI API: https://github.com/hyperonym/basaran

vblagoje · March 14, 2023, 8:40am

Hey guys, why didn’t you register one of the callbacks in generate and choose instead to monkey patch?

zhuda · March 14, 2023, 11:20pm

This is awesome, can’t wait take a proper look!

nielsr · March 27, 2023, 7:11am

Hi,

We recently released a bunch of open-source tools to do this with any HuggingFace model.

Check out our new library “text-generation-inference”: GitHub - huggingface/text-generation-inference: Large Language Model Text Generation Inference. It powers this Space: Chat Llm Streaming - a Hugging Face Space by olivierdehaene.

cc @olivierdehaene

nielsr · June 5, 2023, 11:10am

Update:

We now also support a new Streamer class that works in tandem with the generate method.

Here’s a great Twitter thread by @joaogante going over it: https://twitter.com/joao_gante/status/1643330507093196800

zhuda · June 7, 2023, 10:27pm

This is perfect, will port over to this solution soon for OpenPlayground

Topic		Replies	Views
Text generation. Stream output 🤗Transformers	2	5648	April 4, 2023
Generating text word by word 🤗Transformers	2	906	December 19, 2023
Streaming partial results from hosted text-generation APIs? 🤗Hub	7	4401	August 18, 2023
TGI Model Question 🤗Hub	0	371	September 21, 2023
How to generate text with T5Model other than T5ForConditionalGeneration? 🤗Transformers	0	300	September 22, 2022

Streaming token output from models like T5

Related topics