Loading model from local disk

Im trying to load mlx-community/Mamba-Codestral-7B-v0.1-8bit model from disk.

from mlx_lm import load, generate
self.model, self.tokenizer = load("../models/Mamba-Codestral-7B-v0.1-8bit_model")

ERROR:root:Model type mamba2 not supported.
codestral-mistral-models | Error loading model mlx-community/Mamba-Codestral-7B-v0.1-8bit: Model type mamba2 not supported.
codestral-mistral-models | ERROR:root:Error checking if model is saved: Model type mamba2 not supported.
codestral-mistral-models | ERROR:root:Error loading model: Model type mamba2 not supported.

OR

from transformers import  Mamba2ForCausalLM, AutoTokenizer
self.tokenizer = AutoTokenizer.from_pretrained(self.save_dir)
self.model = AutoModelForCausalLM.from_pretrained(self.save_dir)
//self.model = Mamba2ForCausalLM.from_pretrained(self.save_dir)

Error loading model mlx-community/Mamba-Codestral-7B-v0.1-8bit: The model’s quantization config from the arguments has no quant_method attribute. Make sure that the model has been correctly quantized…

Model is cloned using “git clone mlx-community/Mamba-Codestral-7B-v0.1-8bit · Hugging Face”

Model content:
.git
.gitattributes
README.md
config.json
model-00001-of-00002.safetensors
model-00002-of-00002.safetensors
model.safetensors.index.json
special_tokens_map.json
tokenizer.json
tokenizer.model
tokenizer_config.json

config.json:

{
    "architectures": [
        "Mamba2ForCausalLM"
    ],
    "bos_token_id": 0,
    "chunk_size": 256,
    "conv_kernel": 4,
    "eos_token_id": 0,
    "expand": 2,
    "head_dim": 64,
    "hidden_act": "silu",
    "hidden_size": 4096,
    "initializer_range": 0.1,
    "intermediate_size": 8192,
    "layer_norm_epsilon": 1e-05,
    "model_type": "mamba2",
    "n_groups": 8,
    "norm_before_gate": true,
    "num_heads": 128,
    "num_hidden_layers": 64,
    "pad_token_id": 0,
    "quantization": {
        "group_size": 64,
        "bits": 8
    },
    "quantization_config": {
        "group_size": 64,
        "bits": 8
    },
    "rescale_prenorm_residual": false,
    "residual_in_fp32": true,
    "rms_norm": true,
    "state_size": 128,
    "tie_word_embeddings": false,
    "time_step_floor": 0.0001,
    "time_step_init_scheme": "random",
    "time_step_limit": [
        0.0,
        Infinity
    ],
    "time_step_max": 0.1,
    "time_step_min": 0.001,
    "time_step_rank": 256,
    "time_step_scale": 1.0,
    "torch_dtype": "bfloat16",
    "transformers_version": "4.44.0.dev0",
    "use_bias": false,
    "use_cache": true,
    "use_conv_bias": true,
    "vocab_size": 32768
}

Dockerfile:

# Use a base image with PyTorch and GPU support
FROM huggingface/transformers-pytorch-gpu:latest

# Set working directory inside the container
WORKDIR /workspace

# Install necessary packages
RUN apt-get update && \
    pip3 install mistral_inference>=1 mamba-ssm causal-conv1d && \
    pip3 install langchain torch accelerate flask pydantic uvicorn mlx-lm && \
    pip3 install python-dotenv langchain-community langchain-huggingface llama-index llama-index-embeddings-huggingface && \
    pip3 install peft auto-gptq optimum bitsandbytes sentence-transformers numpy fastapi ...
1 Like

The Model mlx-community/Mamba-Codestral-7B-v0.1-8bits was converted to MLX format from mistralai/Mamba-Codestral-7B-v0.1 using mlx-lm version 0.18.2.

So, let’s try installing the latest version first.

pip install -U mlx-lm

Thanks John6666,

I initially tried installing the latest version as you suggested:

pip install -U mlx-lm

Unfortunately, this didn’t resolve the issue.

ERROR:root:Model type mamba2 not supported.
codestral-mistral-models | Error loading model mlx-community/Mamba-Codestral-7B-v0.1-8bit: Model type mamba2 not supported.
codestral-mistral-models | ERROR:root:Error checking if model is saved: Model type mamba2 not supported.
codestral-mistral-models | ERROR:root:Error loading model: Model type mamba2 not supported.

1 Like