(Tips) Optimizing Underutilized Resources

tech-untukmu-ai · November 15, 2023, 2:10pm

Hi there, I’d like to know your opinion on how to optimize the following setup.

This dedicated endpoint is using 1 NVIDIA TESLA T4 16GB serving sentence-transformers/clip-ViT-B-32-multilingual-v1 · Hugging Face sentence embeddings. I hit the endpoint with chunks (10000 sentences) in each time.

The issue here is, GPU usage seems to be fully allocated, but other resources seem underutilized. I wonder if you have any advise on optimization ? Maybe async querying, or increase batch size to more than 10k, or other ideas.

Cheers

Topic		Replies	Views
Seeking Advice on Optimizing Hardware Resources for Model Training Beginners	3	153	August 4, 2024
Increase quota for Inference Endpoint Inference Endpoints on the Hub	4	179	January 31, 2025
Unable to start inference endpoint: not enough hardware capacity Inference Endpoints on the Hub	6	1205	December 12, 2023
Integration and Scale Inference Endpoints on the Hub	2	53	September 11, 2024
Bad request error when using inference endpoints: Cannot find backend for CPU Inference Endpoints on the Hub	0	149	June 16, 2024

(Tips) Optimizing Underutilized Resources

Related topics