RuntimeError: The size of tensor a (4096) must match the size of tensor b (4097) at non-singleton dimension 3

In the process of inference of the vicuna-13b-16k model I get the following error when the context length is greater than 4096 tokens:

RuntimeError: The size of tensor a (4096) must match the size of tensor b (4097) at non-singleton dimension 3

As I understand it, there are two parameters in the config that allow you to achieve the length of the context (prompt and model response) of 16384 tokens:

"max_position_embeddings": 4096,
"rope_scaling": {
    "factor": 4.0,

The error occurs in the self._update_causal_mask function in the following line:

padding_mask = causal_mask[..., :mask_length].eq(0.0) * attention_mask[:, None, None, :].eq(0.0)

causal_mask has dimensions [1, 1, 4096, 4096].
The cindition if seq_length > self.causal_mask.shape[-1]:is not met because the generation uses the kv-cache and the dimension inputs_embeds (input_tensor) is [1, 1, 5120].
Are there any ideas on how to fix this error?

The problem was in the transformers version 4.38. In version 4.44 the problem was fixed and the model works fine.