HF Inference Endpoints Difference between Max Input Length per Query and Max Token Length per Query

Can anyone explain the difference?

Found it – these are explained in more detail in the TGI documentation here

This topic was automatically closed 12 hours after the last reply. New replies are no longer allowed.