Choosing a hosting or endpoint option to run BART-CNN

tysonfloydlind · September 2, 2024, 3:27pm

Hi,

I’ve found a model that suits my use case for prototyping. However, the free inference API will only take up to 4-5 input prompts for each API request. My project use case needs to send closer to 12-35 separate inputs wrapped into one request.
i.e.
{
“inputs”:[“1”, “2”,“3”, … , “35”]
}

Is this something I can do with the basic $0.032/hour dedicated server option? Or do I need a more powerful server or is there still rate limiting on how many requests you can send to a bart-cnn model?

I’m just unfamiliar with why my inputs are being rate limited and coming back with a 500 if I send more than 4-5 inputs and if that restriction will be gone if I pay for a model and if the basic cheapest model will be able to handle BART or if I have to use a more expensive plan to run BART.

Topic		Replies	Views
Inference API Rate Limits Inference Endpoints on the Hub	1	76	May 16, 2025
Connection Error on Inference Endpoint for Bart-Large-Cnn Models	0	392	May 30, 2023
Use hugging face models Models	1	124	April 24, 2025
Query endpoint for LLM 🤗Transformers	0	421	October 18, 2023
Finetuning bart-large-cnn on a ChatGPT prompts dataset Beginners	0	307	February 12, 2023

Choosing a hosting or endpoint option to run BART-CNN

Related topics