Please advise how to deploy 4bits or 8bits Mixtral 8x7 Model in sagemaker.
I tried the following code but the deployment progress bar doesnt show up.
model_data = ‘s3://path/to/model.tar.gz’
pytorch_model = PyTorchModel(
model_data=model_data,
role=role,
source_dir =‘code’,
framework_version=“1.12.1”,
entry_point=“inference.py”,
py_version =‘py38’,
model_server_workers=1
)predictor = pytorch_model.deploy(initial_instance_count=1,
instance_type=“ml.g4dn.12xlarge”,
endpoint_name = ‘test-mistral8x7b’,
serializer=CSVSerializer(),
deserializer=JSONDeserializer()
)
the code folder contains inference.py & requirements.txt
This is the inference code
from transformers import AutoModelForCausalLM, AutoTokenizer
def model_fn(model_dir):
tokenizer = AutoTokenizer.from_pretrained(model_dir) model = AutoModelForCausalLM.from_pretrained(model_dir, load_in_4bit=True) return model, tokenizer
def predict_fn(data, model_and_tokenizer):
text = data.pop("text", data) # unpack model and tokenizer model, tokenizer = model_and_tokenizer inputs = tokenizer(text, return_tensors="pt").to(0) outputs = model.generate(**inputs, max_new_tokens=20) return tokenizer.decode(outputs[0], skip_special_tokens=True)
The model.tar.tz is downloaded from and uploaded to the s3