Scaling Mistral-7B on AWS SageMaker With Multiple Replica Endpoints

asjoberg · January 19, 2024, 11:13am

Hi,

I’ve been replicating the workflow outlined in this blog-post.

So far, I’ve been successful with deploying multiple replicas of the Mistral Model when using ml.g5.12xlarge and ml.g5.24xlarge. I have one question relating to the MAX_BATCH_TOTAL_TOKENS we set.

Does this parameter limit the number of tokens that can be processed in parallel :

per replica we create
across all the replicas we create
?

Topic		Replies	Views
Number of tokens (2331) exceeded maximum context length (512) error.Even when model supports 8k Context length 🤗Transformers	8	15300	October 6, 2024
Setting up Mistral on Inferentia2 with higher number of tokens Beginners	0	38	September 25, 2024
Truncated output on mistralai/Mistral-7B-Instruct-v0.1 Inference Endpoints on the Hub	4	1745	December 21, 2023
Unable to deploy fine tuned Mistral Amazon SageMaker	0	268	May 6, 2024
Sagemaker Serverless Inference Amazon SageMaker	22	8996	May 22, 2024

Scaling Mistral-7B on AWS SageMaker With Multiple Replica Endpoints

Related topics