This question is conceptual in nature.

Suppose I’m working on a text classification problem where I have 3 labels. To make the problem more concrete, let’s say I’m working on sentiment analysis with ground-truth labels `positive`

, `neutral`

, and `negative`

. I am measuring accuracy and macro-F1.

Now I’d like to make another data set with 5 ground-truth labels: `very positive`

, `positive`

, `neutral`

, `negative`

, and `very negative`

. Intuitively, I would think that the 5-label classification problem is more difficult than the 3-label problem, but the only “proof” I can think of is that a random guess is correct only 1/5 of the time with 5 labels but a random guess is correct 1/3 of the time with 3 labels.

Is there a more formal machine learning argument for why a 5-label problem is more difficult than 3-label? How about an N-label problem to an M-label problem where M > N?

I’m willing to brush up on Vapnik–Chervonenkis theory if that’s needed (hopefully not).