TimeSeriesTransformer - mat1 and mat2 shapes cannot be multiplied

Hi, new to this HF model.

I am using some basic nondescript stock data consisting of three columns: timestamp, price, and volume. I create a time series dataset by using the past 20 timestamps to predict the next two. See the below images for the full code, but here is the error:

File ~/opt/miniforge3/envs/temp/lib/python3.11/site-packages/torch/nn/modules/linear.py:114, in Linear.forward(self, input)
    113 def forward(self, input: Tensor) -> Tensor:
--> 114     return F.linear(input, self.weight, self.bias)

RuntimeError: mat1 and mat2 shapes cannot be multiplied (256x22 and 20x64)

I happen to think it is related to context_length and lags_sequence since if i change those around, the same error with different inside numbers differing by 2 appear (e.g. 19 & 17 rather than what it is now: 22 & 20).

I am passing in data of the shape

torch.Size([128, 20])
torch.Size([128, 20, 2])
torch.Size([128, 2])
torch.Size([128, 2, 2])

which I believe is correct per the documentation for univariate time series. This model would benefit greatly from a medium article or further documentation on how to use it. But for today鈥檚 issue, I do not believe my code is incorrect as shown below.

predictions = model(
            past_values=batch['past_values'],
            past_time_features=batch['past_time_features'],
            past_observed_mask=None,
            future_values=batch['future_values'],
            future_time_features=batch['future_time_features'],
            
        )

I鈥檒l leave the rest of the photos in the comments.

Much love!

Environment:

  • transformers version: 4.35.2
  • Platform: macOS-13.5.1-arm64-arm-64bit
  • Python version: 3.11.5
  • Huggingface_hub version: 0.19.4
  • Safetensors version: 0.4.0
  • Accelerate version: not installed
  • Accelerate config: not found
  • PyTorch version (GPU?): 2.1.0 (False)
  • Tensorflow version (GPU?): not installed (NA)
  • Flax version (CPU?/GPU?/TPU?): not installed (NA)
  • Jax version: not installed
  • JaxLib version: not installed
  • Using GPU in script?: No
  • Using distributed or parallel set-up in script?: No

Further notes:

Other transformer models with similar matrix issues:

Hellow to everyone I am dealing with Multivariate time series forecasting using Transformers

I am faced the same problem on the title
I am going to provide the step by step of my coder

Hello to everyone, I am dealing with multivariate time series forecasting using Transformers.
below is my code step by step:

After some preprocessing and windowing time series dataset 鈥

1- Creating Mask function

input_sequence_length = 10 # incoder input sequence
target_sequence_length = 5 # decoder input sequence

tgt_mask = generate_square_subsequent_mask(
    dim1=target_sequence_length,
    dim2=target_sequence_length
   )
src_mask = generate_square_subsequent_mask(
    dim1=target_sequence_length,
    dim2=input_sequence_length
   )

2- Positional Encoding

class PositionalEncoder(nn.Module):
    def __init__(self, dropout: float = 0.1, 
        max_seq_len: int = 5000, d_model: int = 512,device = device):

        super().__init__()

        self.d_model = d_model
        self.dropout = nn.Dropout(p=dropout)
        self.batch_first = True  # Assuming batch_first is always True

        position = torch.arange(max_seq_len).unsqueeze(1)
        div_term = torch.exp(torch.arange(0, d_model, 2) * (-math.log(10000.0) / d_model))

        pe = torch.zeros(1, max_seq_len, d_model)
        pe[0, :, 0::2] = torch.sin(position * div_term)
        pe[0, :, 1::2] = torch.cos(position * div_term)

        self.register_buffer('pe', pe)
        
    def forward(self, x: Tensor) -> Tensor:
        x = x + self.pe[:, :x.size(1)]
        return self.dropout(x)

3 - Creating Transformers Encoder and Decoder with Pytorch

class TimeSeriesTransformer(nn.Module):

    def __init__(self, 
        input_size: int,
        dec_seq_len: int,
        out_seq_len: int= 5, # target_sequence_length
        dim_val: int=512,  
        n_encoder_layers: int=2,
        n_decoder_layers: int=2,
        n_heads: int=4,
        dropout_encoder: float=0.2, 
        dropout_decoder: float=0.2,
        dropout_pos_enc: float=0.1,
        dim_feedforward_encoder: int=512,
        dim_feedforward_decoder: int=512,
        num_predicted_features: int=1
        ): 

        super().__init__() 

        self.dec_seq_len = dec_seq_len

        self.encoder_input_layer = nn.Linear(
            in_features=input_size, 
            out_features=dim_val 
            )

        self.decoder_input_layer = nn.Linear(
            in_features=num_predicted_features,
            out_features=dim_val
            )  
        
        self.linear_mapping = nn.Linear(
            in_features=dim_val, 
            out_features=num_predicted_features
            )

        # Create positional encoder
        self.positional_encoding_layer = PositionalEncoder(
            d_model=dim_val,
            dropout=dropout_pos_enc
            )

        encoder_layer = nn.TransformerEncoderLayer(
            d_model=dim_val, 
            nhead=n_heads,
            dim_feedforward=dim_feedforward_encoder,
            dropout=dropout_encoder,
            batch_first=True
            )

        self.encoder = nn.TransformerEncoder(
            encoder_layer=encoder_layer,
            num_layers=n_encoder_layers, 
            norm=None
            )

        decoder_layer = nn.TransformerDecoderLayer(
            d_model=dim_val,
            nhead=n_heads,
            dim_feedforward=dim_feedforward_decoder,
            dropout=dropout_decoder,
            batch_first=True
            )

        self.decoder = nn.TransformerDecoder(
            decoder_layer=decoder_layer,
            num_layers=n_decoder_layers, 
            norm=None
            )

    def forward(self, src: Tensor, tgt: Tensor, src_mask: Tensor=None, 
                tgt_mask: Tensor=None) -> Tensor:

        src = self.encoder_input_layer(src) 
      
        src = self.positional_encoding_layer(src) 
        src = self.encoder(src=src)
        
        decoder_output = self.decoder_input_layer(tgt)
        decoder_output = self.decoder(
            tgt=decoder_output,
            memory=src,
            tgt_mask=tgt_mask,
            memory_mask=src_mask
            )
        decoder_output = self.linear_mapping(decoder_output) 
        
        return decoder_output

4 - model

model = TimeSeriesTransformer(
    input_size=7,
    dec_seq_len=5,
    num_predicted_features=1,
    ).to(device)

5 - creating loader # befor created in the preprocessing step

i, batch = next(enumerate(train_loader))
src, trg, trg_y = batch
src = src.to(device) # shape [5 , 10 , 7] , batch size , encoder sequence len , number of feature
trg = trg.to(device) # shape [5 , 5 , 7], batch size , decoder sequence len , number of feature
trg_y = trg_y.to(device) # [5 , 5 , 1] , batch size , deocder or output sequence len , number predicted feature

6 - output of the model

output = model(
    src=src,
    tgt=trg,
    src_mask=src_mask,
    tgt_mask=tgt_mask
    )

7 - Finally the raised error is like below

output = model(
    src=src,
    tgt=trg,
    src_mask=src_mask,
    tgt_mask=tgt_mask
    )
Traceback (most recent call last):

  Cell In[348], line 1
    output = model(

  File C:\ProgramData\anaconda3\Lib\site-packages\torch\nn\modules\module.py:1518 in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)

  File C:\ProgramData\anaconda3\Lib\site-packages\torch\nn\modules\module.py:1527 in _call_impl
    return forward_call(*args, **kwargs)

  Cell In[344], line 80 in forward
    decoder_output = self.decoder_input_layer(tgt)

  File C:\ProgramData\anaconda3\Lib\site-packages\torch\nn\modules\module.py:1518 in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)

  File C:\ProgramData\anaconda3\Lib\site-packages\torch\nn\modules\module.py:1527 in _call_impl
    return forward_call(*args, **kwargs)

  File C:\ProgramData\anaconda3\Lib\site-packages\torch\nn\modules\linear.py:114 in forward
    return F.linear(input, self.weight, self.bias)

RuntimeError: mat1 and mat2 shapes cannot be multiplied (25x7 and 1x512)

Any assistance will be great appriciate thanks from all