Comparing Inference Instances for Text Embedding and Completion Tasks

Smart2233 · May 23, 2023, 11:02am

Hello,

I’m interested in deploying an LLM (Language Model) instance to perform the following task:

Task:
Taking a large amount of text, embedding it, and running completion prompts on top of it.

I’ve come across various options for inference instances, and I would like to gain a better understanding of the following:

Questions:

What specific parameters should I consider when selecting an inference instance for this task?
What are the advantages and disadvantages of using VertexAI, SageMaker, and HF (Hugging Face) instances for this purpose?
Are there any relevant documents or resources that can assist me in deciding which instance is most suitable for my needs?

Thank you!

Amagic · May 23, 2023, 11:13am

Great topic! following

Topic		Replies	Views
Deploying open llm - google/flan-t5-large model on AWS inferentia2 Amazon SageMaker	0	446	September 14, 2023
Is only inference provider :HF Inference API >> permit API Call succefully for any model with fixed URL pattern <f"https://api-inference.huggingface.co/models/{repo_id}"> Beginners	2	41	July 16, 2025
Inference endpoint Intermediate	1	53	August 11, 2024
Inference API detailed request Beginners	5	2391	September 11, 2020
Return embeddings via inference api 🤗Transformers	0	388	January 17, 2023