Training BART, TypeError: forward() got an unexpected keyword argument 'additional_input_ids'

While I was finetuning the Indic BART model following this repo with the below code on Google Colab

!python 'train_nmt.py --train_slang hi --train_tlang hi --dev_slang hi --dev_tlang hi \
    --train_src train.text.hi --train_tgt train.summary.hi --dev_src dev.text.hi \
    --dev_tgt dev.summary.hi --model_path model.ft --encoder_layers 6 --decoder_layers 6 \
    --label_smoothing 0.1 --dropout 0.1 --attention_dropout 0.1 --activation_dropout 0.1 \
    --encoder_attention_heads 16 --decoder_attention_heads 16 --encoder_ffn_dim 4096 \
    --decoder_ffn_dim 4096 --d_model 1024 --tokenizer_name_or_path albert-indicunified64k \
    --warmup_steps 16000 --weight_decay 0.00001 --lr 0.0003 --max_gradient_clip_value 1.0 \
    --dev_batch_size 128 --port 22222 --shard_files --hard_truncate_length 512 \
    --pretrained_model indicbart_model.ckpt --max_src_length 384 --max_tgt_length 40 \
    --is_summarization --dev_batch_size 64 --max_decode_length_multiplier -60 \
    --min_decode_length_multiplier -10 --no_repeat_ngram_size 4 --length_penalty 1.0 \
    --max_eval_batches 20 --hard_truncate_length 512 

I encountered an error saying

TypeError: forward() got an unexpected keyword argument 'additional_input_ids'

Complete output log:

IP address is localhost
Training files are: {'hi-hi': ('train.text.hi', 'train.summary.hi')}
Development files are: {'hi-hi': ('dev.text.hi', 'dev.summary.hi')}
Launching process: 0
Sharding files into 1 parts
For language pair: hi-hi  the total number of lines are: 100 and number of lines per shard are: 100
File for language pair hi-hi has been sharded.
[W ProcessGroupNCCL.cpp:1569] Rank 0 using best-guess GPU 0 to perform barrier as devices used by this process are currently unknown. This can potentially cause a hang if this rank to GPU mapping is incorrect.Specify device_ids in barrier() to force use of a particular device.
Tokenizer is: PreTrainedTokenizer(name_or_path='albert-indicunified64k', vocab_size=64000, model_max_len=1000000000000000019884624838656, is_fast=False, padding_side='right', special_tokens={'bos_token': '[CLS]', 'eos_token': '[SEP]', 'unk_token': '<unk>', 'sep_token': '[SEP]', 'pad_token': '<pad>', 'cls_token': '[CLS]', 'mask_token': AddedToken("[MASK]", rstrip=False, lstrip=True, single_word=False, normalized=True), 'additional_special_tokens': ['<s>', '</s>', '<2as>', '<2bn>', '<2en>', '<2gu>', '<2hi>', '<2kn>', '<2ml>', '<2mr>', '<2or>', '<2pa>', '<2ta>', '<2te>']})
Running DDP checkpoint example on rank 0.
We will do fp32 training
/usr/local/lib/python3.7/dist-packages/torch/optim/lr_scheduler.py:247: UserWarning: To get the last learning rate computed by the scheduler, please use `get_last_lr()`.
  warnings.warn("To get the last learning rate computed by the scheduler, "
/usr/local/lib/python3.7/dist-packages/torch/optim/lr_scheduler.py:134: UserWarning: Detected call of `lr_scheduler.step()` before `optimizer.step()`. In PyTorch 1.1.0 and later, you should call them in the opposite order: `optimizer.step()` before `lr_scheduler.step()`.  Failure to do this will result in PyTorch skipping the first value of the learning rate schedule. See more details at https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate
  "https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate", UserWarning)
Initial LR is: 1.125e-07
Loading from checkpoint. Strict loading by default but if there are missing or non matching keys, they will be ignored when layer remapping or component selection is done.
Remapping layers from parent to child.
Final model dictionary after remapping is: odict_keys(['final_logits_bias', 'model.shared.weight', 'model.encoder.embed_tokens.weight', 'model.encoder.embed_positions.weight', 'model.encoder.layers.0.self_attn.k_proj.weight', 'model.encoder.layers.0.self_attn.k_proj.bias', 'model.encoder.layers.0.self_attn.v_proj.weight', 'model.encoder.layers.0.self_attn.v_proj.bias', 'model.encoder.layers.0.self_attn.q_proj.weight', 'model.encoder.layers.0.self_attn.q_proj.bias', 'model.encoder.layers.0.self_attn.out_proj.weight', 'model.encoder.layers.0.self_attn.out_proj.bias', 'model.encoder.layers.0.self_attn_layer_norm.weight', 'model.encoder.layers.0.self_attn_layer_norm.bias', 'model.encoder.layers.0.fc1.weight', 'model.encoder.layers.0.fc1.bias', 'model.encoder.layers.0.fc2.weight', 'model.encoder.layers.0.fc2.bias', 'model.encoder.layers.0.final_layer_norm.weight', 'model.encoder.layers.0.final_layer_norm.bias', 'model.encoder.layers.1.self_attn.k_proj.weight', 'model.encoder.layers.1.self_attn.k_proj.bias', 'model.encoder.layers.1.self_attn.v_proj.weight', 'model.encoder.layers.1.self_attn.v_proj.bias', 'model.encoder.layers.1.self_attn.q_proj.weight', 'model.encoder.layers.1.self_attn.q_proj.bias', 'model.encoder.layers.1.self_attn.out_proj.weight', 'model.encoder.layers.1.self_attn.out_proj.bias', 'model.encoder.layers.1.self_attn_layer_norm.weight', 'model.encoder.layers.1.self_attn_layer_norm.bias', 'model.encoder.layers.1.fc1.weight', 'model.encoder.layers.1.fc1.bias', 'model.encoder.layers.1.fc2.weight', 'model.encoder.layers.1.fc2.bias', 'model.encoder.layers.1.final_layer_norm.weight', 'model.encoder.layers.1.final_layer_norm.bias', 'model.encoder.layers.2.self_attn.k_proj.weight', 'model.encoder.layers.2.self_attn.k_proj.bias', 'model.encoder.layers.2.self_attn.v_proj.weight', 'model.encoder.layers.2.self_attn.v_proj.bias', 'model.encoder.layers.2.self_attn.q_proj.weight', 'model.encoder.layers.2.self_attn.q_proj.bias', 'model.encoder.layers.2.self_attn.out_proj.weight', 'model.encoder.layers.2.self_attn.out_proj.bias', 'model.encoder.layers.2.self_attn_layer_norm.weight', 'model.encoder.layers.2.self_attn_layer_norm.bias', 'model.encoder.layers.2.fc1.weight', 'model.encoder.layers.2.fc1.bias', 'model.encoder.layers.2.fc2.weight', 'model.encoder.layers.2.fc2.bias', 'model.encoder.layers.2.final_layer_norm.weight', 'model.encoder.layers.2.final_layer_norm.bias', 'model.encoder.layers.3.self_attn.k_proj.weight', 'model.encoder.layers.3.self_attn.k_proj.bias', 'model.encoder.layers.3.self_attn.v_proj.weight', 'model.encoder.layers.3.self_attn.v_proj.bias', 'model.encoder.layers.3.self_attn.q_proj.weight', 'model.encoder.layers.3.self_attn.q_proj.bias', 'model.encoder.layers.3.self_attn.out_proj.weight', 'model.encoder.layers.3.self_attn.out_proj.bias', 'model.encoder.layers.3.self_attn_layer_norm.weight', 'model.encoder.layers.3.self_attn_layer_norm.bias', 'model.encoder.layers.3.fc1.weight', 'model.encoder.layers.3.fc1.bias', 'model.encoder.layers.3.fc2.weight', 'model.encoder.layers.3.fc2.bias', 'model.encoder.layers.3.final_layer_norm.weight', 'model.encoder.layers.3.final_layer_norm.bias', 'model.encoder.layers.4.self_attn.k_proj.weight', 'model.encoder.layers.4.self_attn.k_proj.bias', 'model.encoder.layers.4.self_attn.v_proj.weight', 'model.encoder.layers.4.self_attn.v_proj.bias', 'model.encoder.layers.4.self_attn.q_proj.weight', 'model.encoder.layers.4.self_attn.q_proj.bias', 'model.encoder.layers.4.self_attn.out_proj.weight', 'model.encoder.layers.4.self_attn.out_proj.bias', 'model.encoder.layers.4.self_attn_layer_norm.weight', 'model.encoder.layers.4.self_attn_layer_norm.bias', 'model.encoder.layers.4.fc1.weight', 'model.encoder.layers.4.fc1.bias', 'model.encoder.layers.4.fc2.weight', 'model.encoder.layers.4.fc2.bias', 'model.encoder.layers.4.final_layer_norm.weight', 'model.encoder.layers.4.final_layer_norm.bias', 'model.encoder.layers.5.self_attn.k_proj.weight', 'model.encoder.layers.5.self_attn.k_proj.bias', 'model.encoder.layers.5.self_attn.v_proj.weight', 'model.encoder.layers.5.self_attn.v_proj.bias', 'model.encoder.layers.5.self_attn.q_proj.weight', 'model.encoder.layers.5.self_attn.q_proj.bias', 'model.encoder.layers.5.self_attn.out_proj.weight', 'model.encoder.layers.5.self_attn.out_proj.bias', 'model.encoder.layers.5.self_attn_layer_norm.weight', 'model.encoder.layers.5.self_attn_layer_norm.bias', 'model.encoder.layers.5.fc1.weight', 'model.encoder.layers.5.fc1.bias', 'model.encoder.layers.5.fc2.weight', 'model.encoder.layers.5.fc2.bias', 'model.encoder.layers.5.final_layer_norm.weight', 'model.encoder.layers.5.final_layer_norm.bias', 'model.encoder.layernorm_embedding.weight', 'model.encoder.layernorm_embedding.bias', 'model.encoder.layer_norm.weight', 'model.encoder.layer_norm.bias', 'model.decoder.embed_tokens.weight', 'model.decoder.embed_positions.weight', 'model.decoder.layers.0.self_attn.k_proj.weight', 'model.decoder.layers.0.self_attn.k_proj.bias', 'model.decoder.layers.0.self_attn.v_proj.weight', 'model.decoder.layers.0.self_attn.v_proj.bias', 'model.decoder.layers.0.self_attn.q_proj.weight', 'model.decoder.layers.0.self_attn.q_proj.bias', 'model.decoder.layers.0.self_attn.out_proj.weight', 'model.decoder.layers.0.self_attn.out_proj.bias', 'model.decoder.layers.0.self_attn_layer_norm.weight', 'model.decoder.layers.0.self_attn_layer_norm.bias', 'model.decoder.layers.0.encoder_attn.k_proj.weight', 'model.decoder.layers.0.encoder_attn.k_proj.bias', 'model.decoder.layers.0.encoder_attn.v_proj.weight', 'model.decoder.layers.0.encoder_attn.v_proj.bias', 'model.decoder.layers.0.encoder_attn.q_proj.weight', 'model.decoder.layers.0.encoder_attn.q_proj.bias', 'model.decoder.layers.0.encoder_attn.out_proj.weight', 'model.decoder.layers.0.encoder_attn.out_proj.bias', 'model.decoder.layers.0.encoder_attn_layer_norm.weight', 'model.decoder.layers.0.encoder_attn_layer_norm.bias', 'model.decoder.layers.0.fc1.weight', 'model.decoder.layers.0.fc1.bias', 'model.decoder.layers.0.fc2.weight', 'model.decoder.layers.0.fc2.bias', 'model.decoder.layers.0.final_layer_norm.weight', 'model.decoder.layers.0.final_layer_norm.bias', 'model.decoder.layers.1.self_attn.k_proj.weight', 'model.decoder.layers.1.self_attn.k_proj.bias', 'model.decoder.layers.1.self_attn.v_proj.weight', 'model.decoder.layers.1.self_attn.v_proj.bias', 'model.decoder.layers.1.self_attn.q_proj.weight', 'model.decoder.layers.1.self_attn.q_proj.bias', 'model.decoder.layers.1.self_attn.out_proj.weight', 'model.decoder.layers.1.self_attn.out_proj.bias', 'model.decoder.layers.1.self_attn_layer_norm.weight', 'model.decoder.layers.1.self_attn_layer_norm.bias', 'model.decoder.layers.1.encoder_attn.k_proj.weight', 'model.decoder.layers.1.encoder_attn.k_proj.bias', 'model.decoder.layers.1.encoder_attn.v_proj.weight', 'model.decoder.layers.1.encoder_attn.v_proj.bias', 'model.decoder.layers.1.encoder_attn.q_proj.weight', 'model.decoder.layers.1.encoder_attn.q_proj.bias', 'model.decoder.layers.1.encoder_attn.out_proj.weight', 'model.decoder.layers.1.encoder_attn.out_proj.bias', 'model.decoder.layers.1.encoder_attn_layer_norm.weight', 'model.decoder.layers.1.encoder_attn_layer_norm.bias', 'model.decoder.layers.1.fc1.weight', 'model.decoder.layers.1.fc1.bias', 'model.decoder.layers.1.fc2.weight', 'model.decoder.layers.1.fc2.bias', 'model.decoder.layers.1.final_layer_norm.weight', 'model.decoder.layers.1.final_layer_norm.bias', 'model.decoder.layers.2.self_attn.k_proj.weight', 'model.decoder.layers.2.self_attn.k_proj.bias', 'model.decoder.layers.2.self_attn.v_proj.weight', 'model.decoder.layers.2.self_attn.v_proj.bias', 'model.decoder.layers.2.self_attn.q_proj.weight', 'model.decoder.layers.2.self_attn.q_proj.bias', 'model.decoder.layers.2.self_attn.out_proj.weight', 'model.decoder.layers.2.self_attn.out_proj.bias', 'model.decoder.layers.2.self_attn_layer_norm.weight', 'model.decoder.layers.2.self_attn_layer_norm.bias', 'model.decoder.layers.2.encoder_attn.k_proj.weight', 'model.decoder.layers.2.encoder_attn.k_proj.bias', 'model.decoder.layers.2.encoder_attn.v_proj.weight', 'model.decoder.layers.2.encoder_attn.v_proj.bias', 'model.decoder.layers.2.encoder_attn.q_proj.weight', 'model.decoder.layers.2.encoder_attn.q_proj.bias', 'model.decoder.layers.2.encoder_attn.out_proj.weight', 'model.decoder.layers.2.encoder_attn.out_proj.bias', 'model.decoder.layers.2.encoder_attn_layer_norm.weight', 'model.decoder.layers.2.encoder_attn_layer_norm.bias', 'model.decoder.layers.2.fc1.weight', 'model.decoder.layers.2.fc1.bias', 'model.decoder.layers.2.fc2.weight', 'model.decoder.layers.2.fc2.bias', 'model.decoder.layers.2.final_layer_norm.weight', 'model.decoder.layers.2.final_layer_norm.bias', 'model.decoder.layers.3.self_attn.k_proj.weight', 'model.decoder.layers.3.self_attn.k_proj.bias', 'model.decoder.layers.3.self_attn.v_proj.weight', 'model.decoder.layers.3.self_attn.v_proj.bias', 'model.decoder.layers.3.self_attn.q_proj.weight', 'model.decoder.layers.3.self_attn.q_proj.bias', 'model.decoder.layers.3.self_attn.out_proj.weight', 'model.decoder.layers.3.self_attn.out_proj.bias', 'model.decoder.layers.3.self_attn_layer_norm.weight', 'model.decoder.layers.3.self_attn_layer_norm.bias', 'model.decoder.layers.3.encoder_attn.k_proj.weight', 'model.decoder.layers.3.encoder_attn.k_proj.bias', 'model.decoder.layers.3.encoder_attn.v_proj.weight', 'model.decoder.layers.3.encoder_attn.v_proj.bias', 'model.decoder.layers.3.encoder_attn.q_proj.weight', 'model.decoder.layers.3.encoder_attn.q_proj.bias', 'model.decoder.layers.3.encoder_attn.out_proj.weight', 'model.decoder.layers.3.encoder_attn.out_proj.bias', 'model.decoder.layers.3.encoder_attn_layer_norm.weight', 'model.decoder.layers.3.encoder_attn_layer_norm.bias', 'model.decoder.layers.3.fc1.weight', 'model.decoder.layers.3.fc1.bias', 'model.decoder.layers.3.fc2.weight', 'model.decoder.layers.3.fc2.bias', 'model.decoder.layers.3.final_layer_norm.weight', 'model.decoder.layers.3.final_layer_norm.bias', 'model.decoder.layers.4.self_attn.k_proj.weight', 'model.decoder.layers.4.self_attn.k_proj.bias', 'model.decoder.layers.4.self_attn.v_proj.weight', 'model.decoder.layers.4.self_attn.v_proj.bias', 'model.decoder.layers.4.self_attn.q_proj.weight', 'model.decoder.layers.4.self_attn.q_proj.bias', 'model.decoder.layers.4.self_attn.out_proj.weight', 'model.decoder.layers.4.self_attn.out_proj.bias', 'model.decoder.layers.4.self_attn_layer_norm.weight', 'model.decoder.layers.4.self_attn_layer_norm.bias', 'model.decoder.layers.4.encoder_attn.k_proj.weight', 'model.decoder.layers.4.encoder_attn.k_proj.bias', 'model.decoder.layers.4.encoder_attn.v_proj.weight', 'model.decoder.layers.4.encoder_attn.v_proj.bias', 'model.decoder.layers.4.encoder_attn.q_proj.weight', 'model.decoder.layers.4.encoder_attn.q_proj.bias', 'model.decoder.layers.4.encoder_attn.out_proj.weight', 'model.decoder.layers.4.encoder_attn.out_proj.bias', 'model.decoder.layers.4.encoder_attn_layer_norm.weight', 'model.decoder.layers.4.encoder_attn_layer_norm.bias', 'model.decoder.layers.4.fc1.weight', 'model.decoder.layers.4.fc1.bias', 'model.decoder.layers.4.fc2.weight', 'model.decoder.layers.4.fc2.bias', 'model.decoder.layers.4.final_layer_norm.weight', 'model.decoder.layers.4.final_layer_norm.bias', 'model.decoder.layers.5.self_attn.k_proj.weight', 'model.decoder.layers.5.self_attn.k_proj.bias', 'model.decoder.layers.5.self_attn.v_proj.weight', 'model.decoder.layers.5.self_attn.v_proj.bias', 'model.decoder.layers.5.self_attn.q_proj.weight', 'model.decoder.layers.5.self_attn.q_proj.bias', 'model.decoder.layers.5.self_attn.out_proj.weight', 'model.decoder.layers.5.self_attn.out_proj.bias', 'model.decoder.layers.5.self_attn_layer_norm.weight', 'model.decoder.layers.5.self_attn_layer_norm.bias', 'model.decoder.layers.5.encoder_attn.k_proj.weight', 'model.decoder.layers.5.encoder_attn.k_proj.bias', 'model.decoder.layers.5.encoder_attn.v_proj.weight', 'model.decoder.layers.5.encoder_attn.v_proj.bias', 'model.decoder.layers.5.encoder_attn.q_proj.weight', 'model.decoder.layers.5.encoder_attn.q_proj.bias', 'model.decoder.layers.5.encoder_attn.out_proj.weight', 'model.decoder.layers.5.encoder_attn.out_proj.bias', 'model.decoder.layers.5.encoder_attn_layer_norm.weight', 'model.decoder.layers.5.encoder_attn_layer_norm.bias', 'model.decoder.layers.5.fc1.weight', 'model.decoder.layers.5.fc1.bias', 'model.decoder.layers.5.fc2.weight', 'model.decoder.layers.5.fc2.bias', 'model.decoder.layers.5.final_layer_norm.weight', 'model.decoder.layers.5.final_layer_norm.bias', 'model.decoder.layernorm_embedding.weight', 'model.decoder.layernorm_embedding.bias', 'model.decoder.layer_norm.weight', 'model.decoder.layer_norm.bias', 'lm_head.weight'])
Remapping embeddings.
Eliminating matched params with mismatched sizes from the initial model.
Using label smoothing of 0.1
Using gradient clipping norm of 1.0
Using softmax temperature of 1.0
Training for: ['hi-hi']
Corpora stats: {'hi-hi': 100}
Shuffling corpus: hi-hi
Keyword arguments {'sample': False, 'nbest': 64, 'alpha_or_dropout': 0.1} not recognized.
Keyword arguments {'sample': False, 'nbest': 64, 'alpha_or_dropout': 0.1} not recognized.
Keyword arguments {'sample': False, 'nbest': 64, 'alpha_or_dropout': 0.1} not recognized.
Keyword arguments {'sample': False, 'nbest': 64, 'alpha_or_dropout': 0.1} not recognized.
Keyword arguments {'sample': False, 'nbest': 64, 'alpha_or_dropout': 0.1} not recognized.
Keyword arguments {'sample': False, 'nbest': 64, 'alpha_or_dropout': 0.1} not recognized.
Keyword arguments {'sample': False, 'nbest': 64, 'alpha_or_dropout': 0.1} not recognized.
Keyword arguments {'sample': False, 'nbest': 64, 'alpha_or_dropout': 0.1} not recognized.
Running eval on dev set(s)
Decoding batch from a pool of 100 examples
Traceback (most recent call last):
  File "/content/yanmtt/train_nmt.py", line 884, in <module>
    run_demo()
  File "/content/yanmtt/train_nmt.py", line 881, in run_demo
    mp.spawn(model_create_load_run_save, nprocs=args.gpus, args=(args,train_files, dev_files, quit_condition))         #
  File "/usr/local/lib/python3.7/dist-packages/torch/multiprocessing/spawn.py", line 230, in spawn
    return start_processes(fn, args, nprocs, join, daemon, start_method='spawn')
  File "/usr/local/lib/python3.7/dist-packages/torch/multiprocessing/spawn.py", line 188, in start_processes
    while not context.join():
  File "/usr/local/lib/python3.7/dist-packages/torch/multiprocessing/spawn.py", line 150, in join
    raise ProcessRaisedException(msg, error_index, failed_process.pid)
torch.multiprocessing.spawn.ProcessRaisedException: 

-- Process 0 terminated with the following error:
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/dist-packages/torch/multiprocessing/spawn.py", line 59, in _wrap
    fn(i, *args)
  File "/content/yanmtt/train_nmt.py", line 313, in model_create_load_run_save
    translations = model.module.generate(dev_input_ids, use_cache=True, num_beams=1, max_length=int((len(dev_input_ids[0])*args.max_decode_length_multiplier) if args.max_decode_length_multiplier > 0 else -args.max_decode_length_multiplier), min_length=int((len(dev_input_ids[0])*args.min_decode_length_multiplier) if args.min_decode_length_multiplier > 0 else -args.min_decode_length_multiplier), early_stopping=True, attention_mask=dev_input_masks, pad_token_id=tok.pad_token_id, eos_token_id=tok(["</s>"], add_special_tokens=False).input_ids[0][0], decoder_start_token_id=tok([tlang if args.use_official_pretrained else "<2"+tlang+">"], add_special_tokens=False).input_ids[0][0], bos_token_id=tok(["<s>"], add_special_tokens=False).input_ids[0][0], length_penalty=args.length_penalty, repetition_penalty=args.repetition_penalty, encoder_no_repeat_ngram_size=args.encoder_no_repeat_ngram_size, no_repeat_ngram_size=args.no_repeat_ngram_size, additional_input_ids=dev_input_ids_parent if args.multi_source else None, additional_input_ids_mask=dev_input_masks_parent if args.multi_source else None) ## We translate the batch.
  File "/usr/local/lib/python3.7/dist-packages/torch/autograd/grad_mode.py", line 28, in decorate_context
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.7/dist-packages/transformers/generation_utils.py", line 922, in generate
    model_kwargs = self._prepare_encoder_decoder_kwargs_for_generation(input_ids, model_kwargs)
  File "/usr/local/lib/python3.7/dist-packages/transformers/generation_utils.py", line 417, in _prepare_encoder_decoder_kwargs_for_generation
    model_kwargs["encoder_outputs"]: ModelOutput = encoder(input_ids, return_dict=True, **encoder_kwargs)
  File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
TypeError: forward() got an unexpected keyword argument 'additional_input_ids'

Any solutions for this? It for summarization task.
Thanks in Advance