Hi @philschmid. I’m with difficulty to understand your sentence. When I have a model (generative or not), the same text in input, and of course the same values for arguments of the generate()
method (num_beams
, etc.), I do not understand how the output (ie, the calculation by the model) could not be the same.
I just published a simple Colab notebook (generate_method_T5.ipynb) and run 1000 times the generate()
method with the same input: the output is always the same (Pytorch and pipeline()
).
Here, I understand. If you convert to ONNX format any model, you slightly change the values of the parameters of this model, and then this can create a different output than the corresponding Pytorch model (but again, always the same output from the same input).
As a proof of concept, in the same Colab notebook (generate_method_T5.ipynb), I used the library fastt5 in order to get an ONNX model from the T5 one.
From the question “When is the birthday of Pierre?”:
- the Pytorch and pipeline() models give the answer “17 February”
- and the ONNX model gives “30 years, 160 days”.
I’m fine with that (at least, I understand it).
Yes. I’'m using transformers 4.15
(I did test fastt5 with version > 4.16 but I had error when using the generate()
method).
Last point: in AWS Sagemaker Inference, I did not use the ONNX model but the Pytorch T5 model. As I saw a different output, it means that the DLC for inference does some compression on T5 (ONNX or similar?) that could explain the different output?