Different Summary Outputs Locally vs API for the Same Text

Thanks @Narsil. I’m currently using @philschmid’s amazing model using the inference API to summarise transcripts.

A number of errors in the output where highlighted to me by my team where the output was hallucinating a 3rd person in a 2 person call and so to debug this I was running the model manually in a Colab notebook using exactly the same parameters. I realised that I was not getting the error in output when I load the model manually whereas when I run it in the pipeline I do get the error. I understand that the models generate in a non deterministic way but what’s confusing to me is that the output from the model loaded manually or via the pipeline never change. They’re always the exact same (re-ran 20 times) and the pipeline output always has the hallucination whilst the manual one does not. I’m looking to understand what causes the difference in output and if there’s a variable I had used in my experiments that’s overwritten in the API. I tried with and with model.eval() and with torch.inference_mode() and the manual loaded model output didn’t change.

I can’t share the data but I can mask it and share a redacted version here if that helps:

Manual model loading output (Plausible summary):

Nicolas and Phil are going to give their team access to the AI summarizing some calls next week.

Pipeline model loading output (Improbable summary):

Nicolas, Phil and HuggingFace are going to give Nicolas access to the AI summarizing some calls next week.