Well, I just realized in my case I’m running a GGUF in a Llama.cpp container, and maybe the OpenAI endpoint is the only one available in that container? The docs say this:
" You can deploy any llama.cpp compatible GGUF on the Hugging Face Endpoints. When you create an endpoint with a GGUF model, a llama.cpp container is automatically selected using the latest image built from the master
branch of the llama.cpp repository. Upon successful deployment, a server with an OpenAI-compatible endpoint becomes available."