Inference API offline model limit

it is stored permanently into your cache, i will suggest take it out and host it on your cloud instance