Choosing a hosting or endpoint option to run BART-CNN

Hi,

I’ve found a model that suits my use case for prototyping. However, the free inference API will only take up to 4-5 input prompts for each API request. My project use case needs to send closer to 12-35 separate inputs wrapped into one request.
i.e.
{
“inputs”:[“1”, “2”,“3”, … , “35”]
}

Is this something I can do with the basic $0.032/hour dedicated server option? Or do I need a more powerful server or is there still rate limiting on how many requests you can send to a bart-cnn model?

I’m just unfamiliar with why my inputs are being rate limited and coming back with a 500 if I send more than 4-5 inputs and if that restriction will be gone if I pay for a model and if the basic cheapest model will be able to handle BART or if I have to use a more expensive plan to run BART.