Tensor size mismatch when using Informer

Hi folks! I’m struggling with an error from some code that I inherited and have been working on. I don’t fully understand how the package works, so I might make some silly mistakes. I have trained an Informer on a bunch of data, but when I try to use model.generate() to get predictions I get this error message:

Cell In[14], line 14
      7 for batch in test_dataloader:
      8     with torch.no_grad():
      9         # print(batch["past_time_features"].to(device).shape)
     10         # print(batch["past_values"].to(device).shape)
     11         # print(batch["future_time_features"].to(device).shape)
     12         # print(batch["past_observed_mask"].to(device).shape)
---> 14         outputs = model.generate(
     15             static_categorical_features=batch["static_categorical_features"].to(device)
     16             if model_config.num_static_categorical_features > 0
     17             else None,
     18             static_real_features=batch["static_real_features"].to(device)
     19             if model_config.num_static_real_features > 0
     20             else None,
     21             past_time_features=batch["past_time_features"].to(device),
     22             past_values=batch["past_values"].to(device),
     23             future_time_features=batch["future_time_features"].to(device),
     24             past_observed_mask=batch["past_observed_mask"].to(device),
     25         )
     26         forecasts_.append(outputs.sequences.cpu().numpy())

File ~/.conda/envs/pytorch-1.13.1/lib/python3.11/site-packages/torch/utils/_contextlib.py:115, in context_decorator.<locals>.decorate_context(*args, **kwargs)
    112 @functools.wraps(func)
    113 def decorate_context(*args, **kwargs):
    114     with ctx_factory():
--> 115         return func(*args, **kwargs)

File ~/.conda/envs/pytorch-1.13.1/lib/python3.11/site-packages/transformers/models/informer/modeling_informer.py:2076, in InformerForPrediction.generate(self, past_values, past_time_features, future_time_features, past_observed_mask, static_categorical_features, static_real_features, output_attentions, output_hidden_states)
   2073 #SAM ADD. DONT FORGET TO REMOVE
   2074 #print(lagged_sequence)
   2075 print(reshaped_lagged_sequence.shape, repeated_features[:, : k + 1].shape)
-> 2076 decoder_input = torch.cat((reshaped_lagged_sequence, repeated_features[:, : k + 1]), dim=-1)
   2078 dec_output = decoder(inputs_embeds=decoder_input, encoder_hidden_states=repeated_enc_last_hidden)
   2079 dec_last_hidden = dec_output.last_hidden_state

RuntimeError: Sizes of tensors must match except in dimension 2. Expected size 70 but got size 1 for tensor number 1 in the list.

I printed the size of these two tensors and reshaped_lagged_sequence is the correct size at [51200, 70, 6], whereas repeated_features is the wrong size at [51200, 1, 14].

Here’s the code I used to create the dataloader:

accelerator = Accelerator()
device = accelerator.device

num_variates = 6

model = InformerForPrediction.from_pretrained('anomaly_detection/70-110_pretrained_model_new/hf_model')

model.to(device)
model.eval()

with open("bigger_model_hyperparameters.yml", 'r') as f:
    config = yaml.safe_load(f)

model_config = InformerConfig(
    input_size=num_variates,
    has_labels=False,
    prediction_length=110,
    context_length=70,
    lags_sequence=[0],
    num_time_features=len(config['time_features']) + 1,
    dropout=0.2,
    encoder_layers=config['num_encoder_layers'],
    decoder_layers=config['num_decoder_layers'],
    d_model=config['d_model']
)

test_dataloader = create_train_dataloader(
    config=model_config,
    dataset=test_dataset,
    time_features=[month_of_year if x == 'month_of_year' else None for x in config['time_features']],
    batch_size=512,
    num_batches_per_epoch=10,
    add_objid=True,
) 

Then, here’s the code I used to try to generate the forecasts:

context = 70
prediction = 110
model.eval()

forecasts_ = []

for batch in test_dataloader:
    with torch.no_grad():
        outputs = model.generate(
            static_categorical_features=batch["static_categorical_features"].to(device)
            if model_config.num_static_categorical_features > 0
            else None,
            static_real_features=batch["static_real_features"].to(device)
            if model_config.num_static_real_features > 0
            else None,
            past_time_features=batch["past_time_features"].to(device),
            past_values=batch["past_values"].to(device),
            future_time_features=batch["future_time_features"].to(device),
            past_observed_mask=batch["past_observed_mask"].to(device),
        )
        forecasts_.append(outputs.sequences.cpu().numpy())

Also, if it’s relevant, here’s the output of transformers-cli env:

- `transformers` version: 4.27.4
- Platform: Linux-5.14.21-150400.24.81_12.0.86-cray_shasta_c-x86_64-with-glibc2.31
- Python version: 3.11.2
- Huggingface_hub version: 0.13.3
- PyTorch version (GPU?): 2.1.0 (True)

I’m trying to run this on the NERSC platform, which is why it has that value there.
I’ve trying adjusting various parameters, adding actual lags, changing devices, and re-training the model, but nothing seems to affect the size of this tensor. Let me know if you need any more details, and any help would be appreciated!