Hi @mineshj1291, the reason why I asked is that the current ORTModelForCausalLM
doesn’t have with_past
support so it will recompute the attention for the past sequence during generation. If your PyTorch model makes use of the precomputation, it is not so fair to compare it with the current ORTModelForCausalLM
which does extra computation.
Btw, if you are interested, you can follow the PR for adding with_past
support to ORTModelForCausalLM
. I will try to finish it this week.