I have been trying to figure out how this interplays with the idea that there should be one merge per new token, but practically the reason why there are more merges for the Llama tokenizer is because of the way the model is converted from the tokenizer.model file into a tokenizer.json file (with both merges and vocab).
The script goes through each word in the vocab and lists the merge candidates. i.e. all of the possible merges that could have created the word, so that means any merge where both subwords are also in the vocab (try this yourself and you’ll see the merges line up). This is done because it is not possible to know which of the candidate merges was used to create that word