Error while trying to Load the "deepseek-ai/DeepSeek-V3" model

I am using the following code to run the DeepSeek-V3

My code

!pip install torch==2.4.1
!pip install torchvision==0.19.1
!pip install triton==3.0.0
# !pip install transformers==4.46.3
!pip install transformers==4.36.2
!pip install bitsandbytes==0.41.2
!pip install safetensors==0.4.5
!pip install accelerate>=0.26.0 
!git clone https://github.com/deepseek-ai/DeepSeek-V3.git
cd DeepSeek-V3/inference
!pip install -r requirements.txt 
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig

quantization_config = BitsAndBytesConfig(
    load_in_4bit=True,  
    bnb_4bit_compute_dtype="float16",
    bnb_4bit_use_double_quant=True
)

model = AutoModelForCausalLM.from_pretrained(
    "deepseek-ai/DeepSeek-V3",
    quantization_config=quantization_config,
    device_map="auto",
    trust_remote_code=True
)

tokenizer = AutoTokenizer.from_pretrained("deepseek-ai/DeepSeek-V3")

Error I am getting:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[9], line 9
      1 from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
      3 quantization_config = BitsAndBytesConfig(
      4     load_in_4bit=True,  
      5     bnb_4bit_compute_dtype="float16",
      6     bnb_4bit_use_double_quant=True
      7 )
----> 9 model = AutoModelForCausalLM.from_pretrained(
     10     "deepseek-ai/DeepSeek-V3",
     11     quantization_config=quantization_config,
     12     device_map="auto",
     13     trust_remote_code=True
     14 )
     16 tokenizer = AutoTokenizer.from_pretrained("deepseek-ai/DeepSeek-V3")

File /home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py:559, in _BaseAutoModelClass.from_pretrained(cls, pretrained_model_name_or_path, *model_args, **kwargs)
    557     cls.register(config.__class__, model_class, exist_ok=True)
    558     model_class = add_generation_mixin_to_remote_model(model_class)
--> 559     return model_class.from_pretrained(
    560         pretrained_model_name_or_path, *model_args, config=config, **hub_kwargs, **kwargs
    561     )
    562 elif type(config) in cls._model_mapping.keys():
    563     model_class = _get_model_class(config, cls._model_mapping)
...
    100     )
    102 target_cls = AUTO_QUANTIZATION_CONFIG_MAPPING[quant_method]
    103 return target_cls.from_dict(quantization_config_dict)

ValueError: Unknown quantization type, got fp8 - supported types are: ['awq', 'bitsandbytes_4bit', 'bitsandbytes_8bit', 'gptq', 'aqlm', 'quanto', 'eetq', 'hqq', 'compressed-tensors', 'fbgemm_fp8', 'torchao', 'bitnet']

Could you please help me resolve this issue?

Useful References:

  1. deepseek-ai/DeepSeek-V3 · CUDA out of memory error during fp8 to bf16 model conversion + fix
  2. GitHub - deepseek-ai/DeepSeek-V3
1 Like

Loading with fp8 was not supported due to various problems. It seems that it will be supported soon. Perhaps it can already be used with the github version.

!pip install git+https://github.com/huggingface/transformers