TimeSeriesTransformer - mat1 and mat2 shapes cannot be multiplied

Hi, new to this HF model.

I am using some basic nondescript stock data consisting of three columns: timestamp, price, and volume. I create a time series dataset by using the past 20 timestamps to predict the next two. See the below images for the full code, but here is the error:

File ~/opt/miniforge3/envs/temp/lib/python3.11/site-packages/torch/nn/modules/linear.py:114, in Linear.forward(self, input)
    113 def forward(self, input: Tensor) -> Tensor:
--> 114     return F.linear(input, self.weight, self.bias)

RuntimeError: mat1 and mat2 shapes cannot be multiplied (256x22 and 20x64)

I happen to think it is related to context_length and lags_sequence since if i change those around, the same error with different inside numbers differing by 2 appear (e.g. 19 & 17 rather than what it is now: 22 & 20).

I am passing in data of the shape

torch.Size([128, 20])
torch.Size([128, 20, 2])
torch.Size([128, 2])
torch.Size([128, 2, 2])

which I believe is correct per the documentation for univariate time series. This model would benefit greatly from a medium article or further documentation on how to use it. But for today鈥檚 issue, I do not believe my code is incorrect as shown below.

predictions = model(
            past_values=batch['past_values'],
            past_time_features=batch['past_time_features'],
            past_observed_mask=None,
            future_values=batch['future_values'],
            future_time_features=batch['future_time_features'],
            
        )

I鈥檒l leave the rest of the photos in the comments.

Much love!

Environment:

  • transformers version: 4.35.2
  • Platform: macOS-13.5.1-arm64-arm-64bit
  • Python version: 3.11.5
  • Huggingface_hub version: 0.19.4
  • Safetensors version: 0.4.0
  • Accelerate version: not installed
  • Accelerate config: not found
  • PyTorch version (GPU?): 2.1.0 (False)
  • Tensorflow version (GPU?): not installed (NA)
  • Flax version (CPU?/GPU?/TPU?): not installed (NA)
  • Jax version: not installed
  • JaxLib version: not installed
  • Using GPU in script?: No
  • Using distributed or parallel set-up in script?: No

Further notes:

Other transformer models with similar matrix issues:

Hellow to everyone I am dealing with Multivariate time series forecasting using Transformers

I am faced the same problem on the title
I am going to provide the step by step of my coder

Hello to everyone, I am dealing with multivariate time series forecasting using Transformers.
below is my code step by step:

After some preprocessing and windowing time series dataset 鈥

1- Creating Mask function

input_sequence_length = 10 # incoder input sequence
target_sequence_length = 5 # decoder input sequence

tgt_mask = generate_square_subsequent_mask(
    dim1=target_sequence_length,
    dim2=target_sequence_length
   )
src_mask = generate_square_subsequent_mask(
    dim1=target_sequence_length,
    dim2=input_sequence_length
   )

2- Positional Encoding

class PositionalEncoder(nn.Module):
    def __init__(self, dropout: float = 0.1, 
        max_seq_len: int = 5000, d_model: int = 512,device = device):

        super().__init__()

        self.d_model = d_model
        self.dropout = nn.Dropout(p=dropout)
        self.batch_first = True  # Assuming batch_first is always True

        position = torch.arange(max_seq_len).unsqueeze(1)
        div_term = torch.exp(torch.arange(0, d_model, 2) * (-math.log(10000.0) / d_model))

        pe = torch.zeros(1, max_seq_len, d_model)
        pe[0, :, 0::2] = torch.sin(position * div_term)
        pe[0, :, 1::2] = torch.cos(position * div_term)

        self.register_buffer('pe', pe)
        
    def forward(self, x: Tensor) -> Tensor:
        x = x + self.pe[:, :x.size(1)]
        return self.dropout(x)

3 - Creating Transformers Encoder and Decoder with Pytorch

class TimeSeriesTransformer(nn.Module):

    def __init__(self, 
        input_size: int,
        dec_seq_len: int,
        out_seq_len: int= 5, # target_sequence_length
        dim_val: int=512,  
        n_encoder_layers: int=2,
        n_decoder_layers: int=2,
        n_heads: int=4,
        dropout_encoder: float=0.2, 
        dropout_decoder: float=0.2,
        dropout_pos_enc: float=0.1,
        dim_feedforward_encoder: int=512,
        dim_feedforward_decoder: int=512,
        num_predicted_features: int=1
        ): 

        super().__init__() 

        self.dec_seq_len = dec_seq_len

        self.encoder_input_layer = nn.Linear(
            in_features=input_size, 
            out_features=dim_val 
            )

        self.decoder_input_layer = nn.Linear(
            in_features=num_predicted_features,
            out_features=dim_val
            )  
        
        self.linear_mapping = nn.Linear(
            in_features=dim_val, 
            out_features=num_predicted_features
            )

        # Create positional encoder
        self.positional_encoding_layer = PositionalEncoder(
            d_model=dim_val,
            dropout=dropout_pos_enc
            )

        encoder_layer = nn.TransformerEncoderLayer(
            d_model=dim_val, 
            nhead=n_heads,
            dim_feedforward=dim_feedforward_encoder,
            dropout=dropout_encoder,
            batch_first=True
            )

        self.encoder = nn.TransformerEncoder(
            encoder_layer=encoder_layer,
            num_layers=n_encoder_layers, 
            norm=None
            )

        decoder_layer = nn.TransformerDecoderLayer(
            d_model=dim_val,
            nhead=n_heads,
            dim_feedforward=dim_feedforward_decoder,
            dropout=dropout_decoder,
            batch_first=True
            )

        self.decoder = nn.TransformerDecoder(
            decoder_layer=decoder_layer,
            num_layers=n_decoder_layers, 
            norm=None
            )

    def forward(self, src: Tensor, tgt: Tensor, src_mask: Tensor=None, 
                tgt_mask: Tensor=None) -> Tensor:

        src = self.encoder_input_layer(src) 
      
        src = self.positional_encoding_layer(src) 
        src = self.encoder(src=src)
        
        decoder_output = self.decoder_input_layer(tgt)
        decoder_output = self.decoder(
            tgt=decoder_output,
            memory=src,
            tgt_mask=tgt_mask,
            memory_mask=src_mask
            )
        decoder_output = self.linear_mapping(decoder_output) 
        
        return decoder_output

4 - model

model = TimeSeriesTransformer(
    input_size=7,
    dec_seq_len=5,
    num_predicted_features=1,
    ).to(device)

5 - creating loader # befor created in the preprocessing step

i, batch = next(enumerate(train_loader))
src, trg, trg_y = batch
src = src.to(device) # shape [5 , 10 , 7] , batch size , encoder sequence len , number of feature
trg = trg.to(device) # shape [5 , 5 , 7], batch size , decoder sequence len , number of feature
trg_y = trg_y.to(device) # [5 , 5 , 1] , batch size , deocder or output sequence len , number predicted feature

6 - output of the model

output = model(
    src=src,
    tgt=trg,
    src_mask=src_mask,
    tgt_mask=tgt_mask
    )

7 - Finally the raised error is like below

output = model(
    src=src,
    tgt=trg,
    src_mask=src_mask,
    tgt_mask=tgt_mask
    )
Traceback (most recent call last):

  Cell In[348], line 1
    output = model(

  File C:\ProgramData\anaconda3\Lib\site-packages\torch\nn\modules\module.py:1518 in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)

  File C:\ProgramData\anaconda3\Lib\site-packages\torch\nn\modules\module.py:1527 in _call_impl
    return forward_call(*args, **kwargs)

  Cell In[344], line 80 in forward
    decoder_output = self.decoder_input_layer(tgt)

  File C:\ProgramData\anaconda3\Lib\site-packages\torch\nn\modules\module.py:1518 in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)

  File C:\ProgramData\anaconda3\Lib\site-packages\torch\nn\modules\module.py:1527 in _call_impl
    return forward_call(*args, **kwargs)

  File C:\ProgramData\anaconda3\Lib\site-packages\torch\nn\modules\linear.py:114 in forward
    return F.linear(input, self.weight, self.bias)

RuntimeError: mat1 and mat2 shapes cannot be multiplied (25x7 and 1x512)

Any assistance will be great appriciate thanks from all

Hi,

I know this is an old post but for the benefit of anyone having the same problem I add this explanation.

I believe that you should initialize the configuration of the transfomer setting num_time_features = 2 which is the last dimension of your time tensor. Otherwise this parameter defaults to 0 and creates some layers in the transformer with the wrong size.

Here goes the detailed explanation as it can be seen from the code of the transformer.

The first step that TimeSeriesTransformerModel does in its forward method is to pack the input tensors into one single tensor in the function create_network_inputs and then it projects the last dimension through a linear layer (value_embedding) to set the last dimension to 鈥榙_model鈥 for subsequent multihead attention layers.

This projection linear layer (mat2) is created at the initialization of the transformer with size:

(input_size * len(lags_sequence) + sum(self.embedding_dimension) + self.num_dynamic_real_features + self.num_time_features + self.num_static_real_features + self.input_size * 2, d_model)

in value_embedding.init() method - an instance of class TimeSeriesValueEmbedding -.

However 鈥榗reate_network_inputs鈥 uses the actual inputs to the transformer by collating the values, lagging, standardizing, expanding the tensors provided, etc鈥 In your case the output of this function, 鈥transformer_inputs鈥 (mat1) has size (256,22).

In your case the projection linear layer has size (20, 64) but it should be (22, 64) for the inputs to match the projection layer dimensions in the multiplication.

There are other configuration parameters that could cause the same problem because they default to 0 as well (ex: num_dynamic_real_features, num_static_categorical_features and num_static_real_features) or that default to 1 like input_size.

The code of the transformer could be improved by checking the last dimension of the size of the input to the transformer configuration parameters, and raise an error, for all those parameters.

Zulok