LORA Adapated Deepseek R1 not working with inference endpoints

bhaskars113 · April 2, 2025, 10:53am

I used a standard LORA adapter training pipeline to train the deepseek-ai/DeepSeek-R1-Distill-Llama-8B model. I then stored the model on the huggingface hub using the model.push_to_hub_merged() method. I then wanted to use huggingface inference endpoints to load the model and I kept getting an error. I shall link my model as well as the error I am getting, any help would be appreciated, thanks!

Name of model - bhaskars113/DeepSeek-R1-Entity-8B-V1.1

Error:

Apr 02, 16:20:29	WARN	
🚨🚨BREAKING CHANGE in 2.0🚨🚨: Safetensors conversion is disabled without `--trust-remote-code` because Pickle files are unsafe and can essentially contain remote code execution!Please check for more information here: https://huggingface.co/docs/text-generation-inference/basic_tutorials/safety
Apr 02, 16:20:29	WARN	
No safetensors weights found for model /repository at revision None. Converting PyTorch weights to safetensors.
Apr 02, 16:20:37	ERROR	
: DownloadError
Apr 02, 16:20:37	INFO	
{"timestamp":"2025-04-02T10:50:37.264752Z","level":"ERROR","fields":{"message":"Download encountered an error: \n2025-04-02 10:50:27.692 |      | text_generation_server.utils.import_utils:<module>:76 - Detected system cuda\n╭───────────────────── Traceback (most recent call last) ──────────────────────╮\n│ /usr/src/server/text_generation_server/cli.py:335 in download_weights        │\n│                                                                              │\n│   332 │   │   except Exception:                                              │\n│   333 │   │   │   discard_names = []                                         │\n│   334 │   │   # Convert pytorch weights to safetensors                       │\n│ ❱ 335 │   │   utils.convert_files(local_pt_files, local_st_files, discard_na │\n│   336                                                                        │\n│   337                                                                        │\n│   338 @app.command()                                                         │\n│                                                                              │\n│ ╭───────────────────────────────── locals ─────────────────────────────────╮ │\n│ │      architecture = 'LlamaForCausalLM'                                   │ │\n│ │      auto_convert = True                                                 │ │\n│ │     base_model_id = None                                                 │ │\n│ │            config = {                                                    │ │\n│ │                     │   'architectures': ['LlamaForCausalLM'],           │ │\n│ │                     │   'attention_bias': False,                         │ │\n│ │                     │   'attention_dropout': 0.0,                        │ │\n│ │                     │   'bos_token_id': 128000,                          │ │\n│ │                     │   'eos_token_id': 128001,                          │ │\n│ │                     │   'head_dim': 128,                                 │ │\n│ │                     │   'hidden_act': 'silu',                            │ │\n│ │                     │   'hidden_size': 4096,                             │ │\n│ │                     │   'initializer_range': 0.02,                       │ │\n│ │                     │   'intermediate_size': 14336,                      │ │\n│ │                     │   ... +18                                          │ │\n│ │                     }                                                    │ │\n│ │   config_filename = '/repository/config.json'                            │ │\n│ │     discard_names = ['lm_head.weight']                                   │ │\n│ │         extension = '.safetensors'                                       │ │\n│ │                 f = <_io.TextIOWrapper name='/repository/config.json'    │ │\n│ │                     mode='r' encoding='utf-8'>                           │ │\n│ │    is_local_model = True                                                 │ │\n│ │              json = <module 'json' from                                  │ │\n│ │                     '/root/.local/share/uv/python/cpython-3.11.11-linux… │ │\n│ │       json_output = True                                                 │ │\n│ │    local_pt_files = [                                                    │ │\n│ │                     │                                                    │ │\n│ │                     PosixPath('/repository/pytorch_model-00001-of-00007… │ │\n│ │                     │                                                    │ │\n│ │                     PosixPath('/repository/pytorch_model-00002-of-00007… │ │\n│ │                     │                                                    │ │\n│ │                     PosixPath('/repository/pytorch_model-00003-of-00007… │ │\n│ │                     │                                                    │ │\n│ │                     PosixPath('/repository/pytorch_model-00004-of-00007… │ │\n│ │                     │                                                    │ │\n│ │                     PosixPath('/repository/pytorch_model-00005-of-00007… │ │\n│ │                     │                                                    │ │\n│ │                     PosixPath('/repository/pytorch_model-00006-of-00007… │ │\n│ │                     │                                                    │ │\n│ │                     PosixPath('/repository/pytorch_model-00007-of-00007… │ │\n│ │                     ]                                                    │ │\n│ │    local_st_files = [                                                    │ │\n│ │                     │                                                    │ │\n│ │                     PosixPath('/repository/model-00001-of-00007.safeten… │ │\n│ │                     │                                                    │ │\n│ │                     PosixPath('/repository/model-00002-of-00007.safeten… │ │\n│ │                     │                                                    │ │\n│ │                     PosixPath('/repository/model-00003-of-00007.safeten… │ │\n│ │                     │                                                    │ │\n│ │                     PosixPath('/repository/model-00004-of-00007.safeten… │ │\n│ │                     │                                                    │ │\n│ │                     PosixPath('/repository/model-00005-of-00007.safeten… │ │\n│ │                     │                                                    │ │\n│ │                     PosixPath('/repository/model-00006-of-00007.safeten… │ │\n│ │                     │                                                    │ │\n│ │                     PosixPath('/repository/model-00007-of-00007.safeten… │ │\n│ │                     ]                                                    │ │\n│ │      logger_level = 'INFO'                                               │ │\n│ │        merge_lora = True                                                 │ │\n│ │          model_id = '/repository'                                        │ │\n│ │          revision = None                                                 │ │\n│ │      transformers = <module 'transformers' from                          │ │\n│ │                     '/usr/src/.venv/lib/python3.11/site-packages/transf… │ │\n│ │ trust_remote_code = False                                                │ │\n│ │             utils = <module 'text_generation_server.utils' from          │ │\n│ │                     '/usr/src/server/text_generation_server/utils/__ini… │ │\n│ ╰──────────────────────────────────────────────────────────────────────────╯ │\n│                                                                              │\n│ /usr/src/server/text_generation_server/utils/convert.py:112 in convert_files │\n│                                                                              │\n│   109 │   │   │   continue                                                   │\n│   110 │   │                                                                  │\n│   111 │   │   start = datetime.datetime.now()                                │\n│ ❱ 112 │   │   convert_file(pt_file, sf_file, discard_names)                  │\n│   113 │   │   elapsed = datetime.datetime.now() - start                      │\n│   114 │   │   logger.info(f\"Convert: [{i + 1}/{N}] -- Took: {elapsed}\")      │\n│   115                                                                        │\n│                                                                              │\n│ ╭───────────────────────────────── locals ─────────────────────────────────╮ │\n│ │ discard_names = ['lm_head.weight']                                       │ │\n│ │             i = 0                                                        │ │\n│ │             N = 7                                                        │ │\n│ │       pt_file = PosixPath('/repository/pytorch_model-00001-of-00007.bin… │ │\n│ │      pt_files = [                                                        │ │\n│ │                 │                                                        │ │\n│ │                 PosixPath('/repository/pytorch_model-00001-of-00007.bin… │ │\n│ │                 │                                                        │ │\n│ │                 PosixPath('/repository/pytorch_model-00002-of-00007.bin… │ │\n│ │                 │                                                        │ │\n│ │                 PosixPath('/repository/pytorch_model-00003-of-00007.bin… │ │\n│ │                 │                                                        │ │\n│ │                 PosixPath('/repository/pytorch_model-00004-of-00007.bin… │ │\n│ │                 │                                                        │ │\n│ │                 PosixPath('/repository/pytorch_model-00005-of-00007.bin… │ │\n│ │                 │                                                        │ │\n│ │                 PosixPath('/repository/pytorch_model-00006-of-00007.bin… │ │\n│ │                 │                                                        │ │\n│ │                 PosixPath('/repository/pytorch_model-00007-of-00007.bin… │ │\n│ │                 ]                                                        │ │\n│ │       sf_file = PosixPath('/repository/model-00001-of-00007.safetensors… │ │\n│ │      sf_files = [                                                        │ │\n│ │                 │                                                        │ │\n│ │                 PosixPath('/repository/model-00001-of-00007.safetensors… │ │\n│ │                 │                                                        │ │\n│ │                 PosixPath('/repository/model-00002-of-00007.safetensors… │ │\n│ │                 │                                                        │ │\n│ │                 PosixPath('/repository/model-00003-of-00007.safetensors… │ │\n│ │                 │                                                        │ │\n│ │                 PosixPath('/repository/model-00004-of-00007.safetensors… │ │\n│ │                 │                                                        │ │\n│ │                 PosixPath('/repository/model-00005-of-00007.safetensors… │ │\n│ │                 │                                                        │ │\n│ │                 PosixPath('/repository/model-00006-of-00007.safetensors… │ │\n│ │                 │                                                        │ │\n│ │                 PosixPath('/repository/model-00007-of-00007.safetensors… │ │\n│ │                 ]                                                        │ │\n│ │         start = datetime.datetime(2025, 4, 2, 10, 50, 29, 649491)        │ │\n│ ╰──────────────────────────────────────────────────────────────────────────╯ │\n│                                                                              │\n│ /usr/src/server/text_generation_server/utils/convert.py:93 in convert_file   │\n│                                                                              │\n│    90 │   │   pt_tensor = loaded[k]                                          │\n│    91 │   │   sf_tensor = reloaded[k]                                        │\n│    92 │   │   if not torch.equal(pt_tensor, sf_tensor):                      │\n│ ❱  93 │   │   │   raise RuntimeError(f\"The output tensors do not match for k │\n│    94                                                                        │\n│    95                                                                        │\n│    96 def convert_files(pt_files: List[Path], sf_files: List[Path], discard_ │\n│                                                                              │\n│ ╭───────────────────────────────── locals ─────────────────────────────────╮ │\n│ │       dirname = '/repository'                                            │ │\n│ │ discard_names = ['lm_head.weight']                                       │ │\n│ │             k = 'model.layers.1.self_attn.k_proj.weight'                 │ │\n│ │        loaded = {                                                        │ │\n│ │                 │   'model.embed_tokens.weight': tensor([[-0.0008,       │ │\n│ │                 0.0095, -0.0044,  ...,  0.0049, -0.0009,  0.0005],       │ │\n│ │                 │   │   [-0.0019,  0.0016, -0.0009,  ...,  0.0016,       │ │\n│ │                 -0.0029,  0.0006],                                       │ │\n│ │                 │   │   [ 0.0050, -0.0173,  0.0038,  ...,  0.0061,       │ │\n│ │                 0.0063,  0.0066],                                        │ │\n│ │                 │   │   ...,                                             │ │\n│ │                 │   │   [ 0.0000,  0.0000,  0.0000,  ...,  0.0000,       │ │\n│ │                 -0.0000, -0.0000],                                       │ │\n│ │                 │   │   [ 0.0000, -0.0000, -0.0000,  ...,  0.0000,       │ │\n│ │                 0.0000, -0.0000],                                        │ │\n│ │                 │   │   [-0.0000, -0.0000,  0.0000,  ...,  0.0000,       │ │\n│ │                 -0.0000, -0.0000]],                                      │ │\n│ │                 │      dtype=torch.float16),                             │ │\n│ │                 │   'model.layers.0.self_attn.q_proj.weight':            │ │\n│ │                 tensor([[-0.0303, -0.0229,  0.0315,  ...,  0.0450,       │ │\n│ │                 -0.0190,  0.0166],                                       │ │\n│ │                 │   │   [-0.0358, -0.0204, -0.0146,  ..., -0.0294,       │ │\n│ │                 0.0561, -0.0159],                                        │ │\n│ │                 │   │   [-0.0416, -0.0110, -0.0236,  ..., -0.0320,       │ │\n│ │                 -0.0151,  0.0200],                                       │ │\n│ │                 │   │   ...,                                             │ │\n│ │                 │   │   [ 0.0000,  0.0000,  0.0000,  ...,  0.0000,       │ │\n│ │                 0.0000,  0.0000
Apr 02, 16:20:58	INFO	
Args {
    model_id: "/repository",
    revision: None,
    validation_workers: 2,
    sharded: None,
    num_shard: None,
    quantize: None,
    speculate: None,
    dtype: None,
    kv_cache_dtype: None,
    trust_remote_code: false,
    max_concurrent_requests: 128,
    max_best_of: 2,
    max_stop_sequences: 4,
    max_top_n_tokens: 5,
    max_input_tokens: None,
    max_input_length: None,
    max_total_tokens: None,
    waiting_served_ratio: 0.3,
    max_batch_prefill_tokens: None,
    max_batch_total_tokens: None,
    max_waiting_tokens: 20,
    max_batch_size: None,
    cuda_graphs: None,
    hostname: "r-113industries-deepseek-r1-entity-8b-v1-1-sih-csfvdr-c1edc-v8y",
    port: 80,
    shard_uds_path: "/tmp/text-generation-server",
    master_addr: "localhost",
    master_port: 29500,
    huggingface_hub_cache: Some(
        "/repository/cache",
    ),
    weights_cache_override: None,
    disable_custom_kernels: false,
    cuda_memory_fraction: 1.0,
    rope_scaling: None,
    rope_factor: None,
    json_output: true,
    otlp_endpoint: None,
    otlp_service_name: "text-generation-inference.router",
    cors_allow_origin: [],
    api_key: None,
    watermark_gamma: None,
    watermark_delta: None,
    ngrok: false,
    ngrok_authtoken: None,
    ngrok_edge: None,
    tokenizer_config_path: None,
    disable_grammar_support: false,
    env: false,
    max_client_batch_size: 4,
    lora_adapters: None,
    usage_stats: On,
    payload_limit: 2000000,
    enable_prefill_logprobs: false,
}

John6666 · April 2, 2025, 11:42am

No safetensors weights found for model /repository at revision None. Converting PyTorch weights to safetensors.

As the error message says, there is only .bin files.

You might need:

model.push_to_hub_merged(***, safe_serialization = None) # save weight as .safetensors

Safetensors conversion is disabled without --trust-remote-code because Pickle files are unsafe and can essentially contain remote code execution!

Or –trust-remote-code on loading.

Or use GGUF in llamacpp? It worked normally in my environment.

bhaskars113 · April 22, 2025, 2:05pm

Thanks John! This is really helpful

Topic		Replies	Views
Inference Endpoints creation Intermediate	1	476	January 14, 2024
Deploying Fine-Tune Falcon 40B with QLoRA on Sagemaker Inference Error Amazon SageMaker	29	6899	January 8, 2024
Issue with Deploying LoRA-adapted Model on Hugging Face Endpoint Beginners	10	159	April 26, 2025
Model won't load on custom inference endpoint Inference Endpoints on the Hub	2	377	June 13, 2024
Handling Peft Model the right way (save, load, inference) 🤗Transformers	0	158	August 10, 2024

LORA Adapated Deepseek R1 not working with inference endpoints

Related topics