Are there any docs/guides/tutorials to write custom endpoints to be hosted on Huggingface Hub’s inference endpoints?
We’re particularly looking at these models which does things a little differently from normal Huggingface AutoModels (that’ll need A100):
- Unbabel/XCOMET-XXL · Hugging Face
- google/metricx-23-xxl-v2p0 · Hugging Face
- GitHub - google-research/metricx
- GitHub - Unbabel/COMET: A Neural Framework for MT Evaluation
And potentially also these that runs on T4: