How to deploy quantized Mixtral 8x7b from Sagemaker?

Please advise how to deploy 4bits or 8bits Mixtral 8x7 Model in sagemaker.
I tried the following code but the deployment progress bar doesnt show up.

model_data = ‘s3://path/to/model.tar.gz’

pytorch_model = PyTorchModel(
model_data=model_data,
role=role,
source_dir =‘code’,
framework_version=“1.12.1”,
entry_point=“inference.py”,
py_version =‘py38’,
model_server_workers=1
)

predictor = pytorch_model.deploy(initial_instance_count=1,
instance_type=“ml.g4dn.12xlarge”,
endpoint_name = ‘test-mistral8x7b’,
serializer=CSVSerializer(),
deserializer=JSONDeserializer()
)

the code folder contains inference.py & requirements.txt

This is the inference code

from transformers import AutoModelForCausalLM, AutoTokenizer

def model_fn(model_dir):

  tokenizer = AutoTokenizer.from_pretrained(model_dir)
  model = AutoModelForCausalLM.from_pretrained(model_dir, load_in_4bit=True)
  return model, tokenizer

def predict_fn(data, model_and_tokenizer):

  text = data.pop("text", data)
  # unpack model and tokenizer
  model, tokenizer = model_and_tokenizer
  inputs = tokenizer(text, return_tensors="pt").to(0)
  outputs = model.generate(**inputs, max_new_tokens=20)
  return tokenizer.decode(outputs[0], skip_special_tokens=True)

The model.tar.tz is downloaded from and uploaded to the s3