To elaborate on Llama 2 requiring PRO subscription:
yes to use it with inference api, you need pro subscription since its too large(13gbish>10gb which is free api limit). Ofcourse you could run it locally without any error
I could not find reference to this in the docs. To state it differently, the “free” Inference API is rate limited (number of requests?), available only for models < 10GB and (possibly) excludes all Llama 2 models.