For those training or deploying HF models on Google Cloud GKE - what’s your experience like? As a user, I look at Models, pick the model I want to fine-tune / deploy and then it’s a personal journey of figuring it out on my own. So far (for inference as example) it’s been:
- Download model artifacts manually from HF Model hub
- Package a base serving image (like TF Serving or TGI) with model artifacts and run locally
- Once happy with serving results, upload to Artifact Registry
- Spin up GKE cluster if one isn’t available, write Deployment and Service manifests and deploy the image (Cloud Run is an enticing alternative)
- Use external IP to serve results
Some models have guidance on training / deploying for inference so that helps a bit. I’d like to learn from others here:
- How do you decide what model to use for your use case given the pace of releases (adjacent question but curious to know ) ?
- What does your stack look like for train / deploy on GKE?
- Any major pain points around model discovery / fine-tune / serving ?