RuntimeError: The size of tensor a (48) must match the size of tensor b (64) at \nnon-singleton dimension 0"}

Ideaentity21 · April 29, 2025, 8:04am

Hi there,
i have fine tuned phi-4-mini-instruct model and try deploy in huggingface endpoint inference and got the error.
Error is:
Server message]Endpoint failed to start
Exit code: 1. Reason: �� │\n[rank1]: │ │ 1., 1., 1., 1., 1., 1., │ │\n[rank1]: │ │ │ │ 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., │ │\n[rank1]: │ │ 1.], device=‘cuda:1’) │ │\n[rank1]: │ ╰──────────────────────────────────────────────────────────────────────────╯ │\n[rank1]: ╰──────────────────────────────────────────────────────────────────────────────╯\n[rank1]: RuntimeError: The size of tensor a (48) must match the size of tensor b (64) at \n[rank1]: non-singleton dimension 0"},“target”:“text_generation_launcher”,“span”:{“rank”:1,“name”:“shard-manager”},“spans”:[{“rank”:1,“name”:“shard-manager”}]}
{“timestamp”:“2025-04-29T07:29:42.246550Z”,“level”:“ERROR”,“fields”:{“message”:“Shard 1 failed to start”},“target”:“text_generation_launcher”}
{“timestamp”:“2025-04-29T07:29:42.246595Z”,“level”:“INFO”,“fields”:{“message”:“Shutting down shards”},“target”:“text_generation_launcher”}
{“timestamp”:“2025-04-29T07:29:42.269764Z”,“level”:“INFO”,“fields”:{“message”:“Terminating shard”},“target”:“text_generation_launcher”,“span”:{“rank”:0,“name”:“shard-manager”},“spans”:[{“rank”:0,“name”:“shard-manager”}]}
{“timestamp”:“2025-04-29T07:29:42.269807Z”,“level”:“INFO”,“fields”:{“message”:“Waiting for shard to gracefully shutdown”},“target”:“text_generation_launcher”,“span”:{“rank”:0,“name”:“shard-manager”},“spans”:[{“rank”:0,“name”:“shard-manager”}]}
{“timestamp”:“2025-04-29T07:29:42.470177Z”,“level”:“INFO”,“fields”:{“message”:“shard terminated”},“target”:“text_generation_launcher”,“span”:{“rank”:0,“name”:“shard-manager”},“spans”:[{“rank”:0,“name”:“shard-manager”}]}
Error: ShardCannotStart

then i try to deploy the base microsoft-phi-4-mini-instruct model and got the same error.

so can anyone help me in this, to resolve the issue?

John6666 · April 29, 2025, 8:47am

Seems unresolved issue of TGI?

rdaya
In case anyone runs into this, the trick (a bad one) is to set USE_FLASH_ATTENTION=false

Topic		Replies	Views
ERROR: The size of tensor a () must match the size of tensor b () at non-singleton dimension 1 Inference Endpoints on the Hub	0	898	January 11, 2024
I am following a hugging face guide for fine tuning whisper but I run into error when training 🤗Transformers	0	171	March 15, 2024
Trainer RuntimeError: The size of tensor a (462) must match the size of tensor b (448) at non-singleton dimension 1 🤗Transformers	17	44680	May 23, 2024
{"error":"The expanded size of the tensor (524) must match the existing size (514) at non-singleton dimension 1. Target sizes: [1, 524]. Tensor sizes: [1, 514]"} Models	0	151	November 22, 2024
RuntimeError: The size of tensor a (4096) must match the size of tensor b (4097) at non-singleton dimension 3 Models	1	431	August 24, 2024

RuntimeError: The size of tensor a (48) must match the size of tensor b (64) at \nnon-singleton dimension 0"}

Related topics