Streaming LangGraph with HuggingFacePipeline

sheepyyy · February 21, 2025, 9:56am

LangGraph supports streaming via the self.graph.stream method. However, the examples provided by Hugging Face only demonstrate direct LLM invocation (e.g., OpenAI LLMs). It appears that streaming from the HuggingFacePipeline is not supported.

For instance, after building the graph and invoking self.graph.stream with streaming_mode set to "messages", the expected behavior is for the response to stream incrementally when the generate node is triggered. However, instead of streaming, the graph API buffers the output and only pushes the full response after completion. This suggests that the yield method is not functioning as expected.

Has anyone managed to get this working? Below is an example of the generate method:

def generate(self, prompt):  
    response_stream = HuggingFacePipeline.stream(prompt, config=self.config)  

    full_response = ""  
    for chunk in response_stream:  
        full_response += chunk  
        yield {"answer": full_response}  
    return

nnilayy · February 22, 2025, 8:19am

Hey @sheepyyy,
I’m not too familiar with LangGraph specifically, but I am with LangChain, So I tweaked a few things, and the following reproducer works fine, yield correctly streams the responses without any issues.

from langchain_huggingface.llms import HuggingFacePipeline
from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline

model_id = "gpt2"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id)
pipe = pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer,
    max_new_tokens=100,
)

def generate(pipe, prompt):
    hf = HuggingFacePipeline(pipeline=pipe)
    for chunk in hf.stream(prompt):
        yield chunk

prompt = "Hugging Face is"
for chunk in generate(pipe, prompt):
    print(chunk, end="", flush=True)

If you’re implementing your own class and generate method, I’d recommend first initializing a HuggingFacePipeline instance, passing in the pipeline, and then calling the stream method on the instance instead of invoking stream as a class method.

Could you try this approach and modify the yield as per your requirements and check whether it works for you or not?

If the issue persists, feel free to share more details, and we can look into this further.

sheepyyy · February 22, 2025, 9:27pm

Hey there, first of all thank you for your response! The solution you have got there is basically where I got to until I get stuck when trying to wrap it around LangGraph, which is apparently the “new ways of doing things” going forward. Therefore, I was wondering if that was a limitation from the graph.stream funciton itself or if I have missed out any sneaky arguments/config/parameters. I guess for now I can use this OG way to stream the LLM tokens. For reference, the graph streaming code I was referring to is this one: How to stream LLM tokens from your graph

Topic		Replies	Views
Using Huggingfacehub as LLM in Langflow, encountered Question Beginners	0	290	June 1, 2023
Streaming token output from models like T5 🤗Transformers	7	12208	June 7, 2023
Can't stream response token by token Beginners	5	882	September 14, 2024
Chatbot in offline mode using when using langchain.HuggingFaceImbeddings 🤗Transformers	0	4832	November 3, 2023
Getting Additional response from my RAG using HuggingFaceEndpoint inference Beginners	3	41	March 16, 2025

Streaming LangGraph with HuggingFacePipeline

Related topics