Classification problem difficulty when going from 3 classes to 5 classes?

This question is conceptual in nature.

Suppose I’m working on a text classification problem where I have 3 labels. To make the problem more concrete, let’s say I’m working on sentiment analysis with ground-truth labels positive, neutral, and negative. I am measuring accuracy and macro-F1.

Now I’d like to make another data set with 5 ground-truth labels: very positive, positive, neutral, negative, and very negative. Intuitively, I would think that the 5-label classification problem is more difficult than the 3-label problem, but the only “proof” I can think of is that a random guess is correct only 1/5 of the time with 5 labels but a random guess is correct 1/3 of the time with 3 labels.

Is there a more formal machine learning argument for why a 5-label problem is more difficult than 3-label? How about an N-label problem to an M-label problem where M > N?

I’m willing to brush up on Vapnik–Chervonenkis theory if that’s needed (hopefully not).

Any help, intuition, hints, pointers, or references would be appreciated.