Best way to use multiple pipelines in conjunction on a single dataset?

AMcGrail · June 27, 2022, 7:08am

Hello. I’m extremely new to ML in general and wanted to know what the best way to use multiple pipelines in conjunction was. For example, I have a pipeline for summarizing text, and a pipeline for determining the sentiment of a sentence. I would like to use both of them on a CSV file and log their results in a CSV as fast as reasonably possible.

I am able to do this by just calling the pipelines ‘normally’, but it feels somewhat slower than it really should, and it’s not using a lot of CPU/memory (though it is using quite a bit of GPU memory, about half of it).

Code below is how I’m batching/calling the data. I don’t think I’m making any obvious mistakes

    loader = DataLoader(dataset["train"], batch_size=32, num_workers=4, pin_memory=True)
    for chunk in loader:
        for title, content in zip(chunk["title"], chunk["content"]):
            print(title)
            analyzer_result = analyzer.analyze_text(content)
            print(analyzer_result.sentiment, analyzer_result.tags)

It doesn’t help that I’ve never really done ML before so I have no idea how fast it should really be going (I’ve only got about ~3 iterations per second).

Any help is appreciated.

Topic		Replies	Views
Best way to pass multiple pipelines over the same dataset Models	0	165	September 6, 2023
How to pass a pipeline over a dataset with multiple columns 🤗Datasets	4	1028	September 6, 2023
Pipeline inference with Dataset api 🤗Transformers	5	12039	November 15, 2023
What's the best way to speed up inference on a large dataset? Beginners	3	3905	March 13, 2022
Using Datasets, DataCollators and DataLoaders to create an NLP data pipeline 🤗Datasets	1	5032	June 21, 2023

Best way to use multiple pipelines in conjunction on a single dataset?

Related topics