Any advice on LLM inference over a large dataset?

I would like to do some research and apply a single prompt to each entry in a text column in a large-ish dataset and then collect the results into a new column.

I played in Databricks with the Falcon 7B model and entered a list of 80 prompts as my first test. It took a few hours to finish. I’m thinking that running the 40B model on a larger cluster is only going to get slower and slower. I’ll try to run the 40B model anyway as soon as I can get the compute for it.

What is the recommended optimal procedure here? What kind of time/results can I expect in the best case?