RuntimeError: The size of tensor a (4096) must match the size of tensor b (4097) at non-singleton dimension 3

Yurkoff · August 22, 2024, 5:48pm

In the process of inference of the vicuna-13b-16k model I get the following error when the context length is greater than 4096 tokens:

RuntimeError: The size of tensor a (4096) must match the size of tensor b (4097) at non-singleton dimension 3

As I understand it, there are two parameters in the config that allow you to achieve the length of the context (prompt and model response) of 16384 tokens:

"max_position_embeddings": 4096,
"rope_scaling": {
    "factor": 4.0,

The error occurs in the self._update_causal_mask function in the following line:

padding_mask = causal_mask[..., :mask_length].eq(0.0) * attention_mask[:, None, None, :].eq(0.0)

causal_mask has dimensions [1, 1, 4096, 4096].
The cindition if seq_length > self.causal_mask.shape[-1]:is not met because the generation uses the kv-cache and the dimension inputs_embeds (input_tensor) is [1, 1, 5120].
Are there any ideas on how to fix this error?

Yurkoff · August 24, 2024, 2:07pm

The problem was in the transformers version 4.38. In version 4.44 the problem was fixed and the model works fine.

Topic		Replies	Views
RuntimeError: Sizes of tensors must match except in dimension 2. Expected size 8 but got size 64 for tensor number 1 in the list 🤗Transformers	1	1213	January 4, 2024
Number of tokens (2331) exceeded maximum context length (512) error.Even when model supports 8k Context length 🤗Transformers	8	15357	October 6, 2024
{"error":"The expanded size of the tensor (524) must match the existing size (514) at non-singleton dimension 1. Target sizes: [1, 524]. Tensor sizes: [1, 514]"} Models	0	152	November 22, 2024
Fine tuned Mistral 7B inference issue for >4k context length token with transformer 4.35+ 🤗Transformers	0	556	December 11, 2023
RuntimeError: stack expects each tensor to be equal size, but got [12] at entry 0 and [35] at entry 1 🤗Transformers	2	5932	September 3, 2023

RuntimeError: The size of tensor a (4096) must match the size of tensor b (4097) at non-singleton dimension 3

Related topics