Gen AI on GCP GKE

ashokpatelapk · January 5, 2024, 4:38pm

For those training or deploying HF models on Google Cloud GKE - what’s your experience like? As a user, I look at Models, pick the model I want to fine-tune / deploy and then it’s a personal journey of figuring it out on my own. So far (for inference as example) it’s been:

Download model artifacts manually from HF Model hub
Package a base serving image (like TF Serving or TGI) with model artifacts and run locally
Once happy with serving results, upload to Artifact Registry
Spin up GKE cluster if one isn’t available, write Deployment and Service manifests and deploy the image (Cloud Run is an enticing alternative)
Use external IP to serve results

Some models have guidance on training / deploying for inference so that helps a bit. I’d like to learn from others here:

How do you decide what model to use for your use case given the pace of releases (adjacent question but curious to know ) ?
What does your stack look like for train / deploy on GKE?
Any major pain points around model discovery / fine-tune / serving ?

nlassaux · January 5, 2024, 5:16pm

We have been using Google Cloud Batch to spawn machines with GPU (on spot instance if you want cheaper), and train. And to serve, we’ve used Cloud Run so far (CPU only inference), since it scales so easily. Interested in seeing what’s done elsewhere!

ashokpatelapk · January 16, 2024, 10:16pm

Interesting. Hadn’t thought of Batch for training jobs. I like the idea of spot instances if checkpoints are continually saved but that’s a cost - reliability tradeoff decision I believe. Thanks for sharing. And yes, Cloud Run for inference makes sense.

Topic		Replies	Views
New: Distributed GPU Platform Research	2	667	November 8, 2023
A tool to connect local Jupyter notebooks to cloud GPUs Show and Tell	0	206	August 15, 2024
On Demand GPU model hosting? Beginners	3	979	June 2, 2025
What hardware do you use to train your models? Cloud or local? Intermediate	0	783	October 31, 2022
Deploying PyTorch ViT to Vertex AI using model artifacts 🤗Transformers	0	333	December 29, 2022

Gen AI on GCP GKE

Related topics