Hi
I just finish reading this tutorial on Unigram.
And I have a question about this line in function encode_word()
best_segmentations = [{"start": 0, "score": 1}] + [
{"start": None, "score": None} for _ in range(len(word))
]
As per my understanding, the score inside dictionary of the first list should be log(1) not 1 ???
Because in this line
score = model[token] + best_score_at_start
we are summing the log of probability.
So I suspect that [{"start": 0, "score": 1}]
should be [{"start": 0, "score": 0}]
Can someone clarify me this matter?
Thanks