Difference API results and local transformer results

Hi there,

so I found this model here: Einmalumdiewelt/T5-Base_GNAD · Hugging Face and wanted to use it to summarize texts. So I added the following python code:

tokenizer = AutoTokenizer.from_pretrained("Einmalumdiewelt/T5-Base_GNAD")
model = AutoModelForSeq2SeqLM.from_pretrained("Einmalumdiewelt/T5-Base_GNAD")
    inputs = tokenizer([prepared_text], max_length=1024, truncation=True,
                       return_tensors="pt", is_split_into_words=True)

# Generate Summary
summary_ids = model.generate(inputs["input_ids"], num_beams=2, min_length=0,
                                 max_length=1024)
summary = tokenizer.batch_decode(summary_ids, skip_special_tokens=True,
                                     clean_up_tokenization_spaces=False)[0]
print(summary)

and it worked well. Then I compared it to the results given by the interference api of the same model and I got pretty different results. Actually, the results in the API were better.

Could someone explain to me, what the difference is? And why I get different results? I really don’t want to use the API, if I could just use it locally, but the API does give me better results…

Thank you in advance!