Hello. I’m extremely new to ML in general and wanted to know what the best way to use multiple pipelines in conjunction was. For example, I have a pipeline for summarizing text, and a pipeline for determining the sentiment of a sentence. I would like to use both of them on a CSV file and log their results in a CSV as fast as reasonably possible.
I am able to do this by just calling the pipelines ‘normally’, but it feels somewhat slower than it really should, and it’s not using a lot of CPU/memory (though it is using quite a bit of GPU memory, about half of it).
Code below is how I’m batching/calling the data. I don’t think I’m making any obvious mistakes
loader = DataLoader(dataset["train"], batch_size=32, num_workers=4, pin_memory=True)
for chunk in loader:
for title, content in zip(chunk["title"], chunk["content"]):
print(title)
analyzer_result = analyzer.analyze_text(content)
print(analyzer_result.sentiment, analyzer_result.tags)
It doesn’t help that I’ve never really done ML before so I have no idea how fast it should really be going (I’ve only got about ~3 iterations per second).
Any help is appreciated.