TypeError: decode() got an unexpected keyword argument 'clean_up_tokenization_spaces'

I’m using this code to try indonesian MBART pretrained model. I’m going to use it for summarization.

from transformers import Seq2SeqTrainingArguments, Seq2SeqTrainer, pipeline
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM, DataCollatorForSeq2Seq
from indobenchmark import IndoNLGTokenizer
import time
import tensorflow as tf
import os, re, logging
import pandas as pd

model1 = "indobenchmark/indobart-v2"
print(model1)

tokenizer = IndoNLGTokenizer.from_pretrained("indobenchmark/indobart-v2")
model = AutoModelForSeq2SeqLM.from_pretrained(model1)

summarizer = pipeline(
    "summarization", model=model, tokenizer=tokenizer, 
    num_beams=5, do_sample=True, no_repeat_ngram_size=3
)

summarizer(
    "some indonesian article",
    min_length=20,
    max_length=144,
)

the code throws this error

TypeError: decode() got an unexpected keyword argument 'clean_up_tokenization_spaces'

complete stacktrace : err mbart indobenchmark · GitHub. Note that this error shows at jupyter notebook.

I always get such an error while inferencing using pipeline. What’s actually going wrong here?

I also using this method to try to generate summary. It runs normally.

def sumt5m(model, pr):
	input_ids = tokenizer.encode(pr, max_length=10240, return_tensors='pt')
	summary_ids = model.generate(input_ids,
				max_length=100, 
				num_beams=2,
				repetition_penalty=2.5, 
				length_penalty=1.0, 
				early_stopping=True,
				no_repeat_ngram_size=2,
				use_cache=True)
	return tokenizer.decode(summary_ids[0], skip_special_tokens=True)

Apparently I used custom tokenizer from indobenchmark, and the code is a bit outdated. I fixed it by adding clean_up_tokenization_spaces arg to correctly override decode method from PreTrainedTokenizer baseclass, like this

	def decode(self, inputs, skip_special_tokens=False, clean_up_tokenization_spaces: bool = True):
		outputs = super().decode(inputs, skip_special_tokens=skip_special_tokens, clean_up_tokenization_spaces=clean_up_tokenization_spaces)
		return outputs.replace(' ','').replace('▁', ' ')