Huggingface token usage for routed requests for a custom provider

Hello!
Currently i am working on inference provider integration, and seems like there is a lack of documentation on the routed requests specific tokens
I have read this doc ofc - How to be registered as an inference provider on the Hub?

So, the question is - when user is doing the routed request to our backend - the HF token is used
From our side - we need to validate that token (whoami-v2 afaik), and just pass the request to inference backend, creating the random UUID Inference-ID header response, and storing that request for billing endpoint later
I see the opportunity to simply abuse HF token usage, when just by passing valid HF token to our backend multiple “free” non-billed requests can be made
I don’t see any prevention mechanism in the docs, nor the other routed requests info (like additional token, or so)
And the only way from there - check if request isn’t billed for that token in some time window, and ban it if not
Yet, that is kinda overcomplicated way to do such stuff

Is there anything i missing?

1 Like