What I meant was that on the module card they set that to be the end of sequence token within the models configuration (NOT in the prompt)
The model then generates this token itself and stops generating.
The logic is basically that the model keeps generating new words based on the previous words until it sees (generates) the end of sequence token OR it hits the token limit which I think defaults to 256 in this case (though I could be wrong on the exact number)
I’m unsure as to whether sagemaker looks after this or if you will need to set it yourself.