In our case, we consider hug because it is a strict substring of “hugs”. The notion of strict substring is only used here to select the initial tokens for this toy example (in a real use case, we will use a BPE algorithm for example). Then we calculate their frequency of appearance, independently of the fact that they are a strict substring or not.
Hello!
How to do the Try it out! Compute the start and end indices for the five most likely answers.?