What is the purpose of 'use_cache' in decoder?

Would this cache also be used if I call the generate method multiple times with the same conditional text as input?
I’d like to see the intermediate results of the prediction but I don’t want to calculate the hidden states unnecessarily many times.

1 Like