I’m interested in deploying an LLM (Language Model) instance to perform the following task:
Taking a large amount of text, embedding it, and running completion prompts on top of it.
I’ve come across various options for inference instances, and I would like to gain a better understanding of the following:
- What specific parameters should I consider when selecting an inference instance for this task?
- What are the advantages and disadvantages of using VertexAI, SageMaker, and HF (Hugging Face) instances for this purpose?
- Are there any relevant documents or resources that can assist me in deciding which instance is most suitable for my needs?