Best way to use multiple pipelines in conjunction on a single dataset?

Hello. I’m extremely new to ML in general and wanted to know what the best way to use multiple pipelines in conjunction was. For example, I have a pipeline for summarizing text, and a pipeline for determining the sentiment of a sentence. I would like to use both of them on a CSV file and log their results in a CSV as fast as reasonably possible.

I am able to do this by just calling the pipelines ‘normally’, but it feels somewhat slower than it really should, and it’s not using a lot of CPU/memory (though it is using quite a bit of GPU memory, about half of it).

Code below is how I’m batching/calling the data. I don’t think I’m making any obvious mistakes

    loader = DataLoader(dataset["train"], batch_size=32, num_workers=4, pin_memory=True)
    for chunk in loader:
        for title, content in zip(chunk["title"], chunk["content"]):
            print(title)
            analyzer_result = analyzer.analyze_text(content)
            print(analyzer_result.sentiment, analyzer_result.tags)

It doesn’t help that I’ve never really done ML before so I have no idea how fast it should really be going (I’ve only got about ~3 iterations per second).

Any help is appreciated.

1 Like