Comparing Inference Instances for Text Embedding and Completion Tasks


I’m interested in deploying an LLM (Language Model) instance to perform the following task:

Taking a large amount of text, embedding it, and running completion prompts on top of it.

I’ve come across various options for inference instances, and I would like to gain a better understanding of the following:


  1. What specific parameters should I consider when selecting an inference instance for this task?
  2. What are the advantages and disadvantages of using VertexAI, SageMaker, and HF (Hugging Face) instances for this purpose?
  3. Are there any relevant documents or resources that can assist me in deciding which instance is most suitable for my needs?

Thank you!

1 Like

Great topic! following