This question is conceptual in nature.
Suppose I’m working on a text classification problem where I have 3 labels. To make the problem more concrete, let’s say I’m working on sentiment analysis with ground-truth labels positive
, neutral
, and negative
. I am measuring accuracy and macro-F1.
Now I’d like to make another data set with 5 ground-truth labels: very positive
, positive
, neutral
, negative
, and very negative
. Intuitively, I would think that the 5-label classification problem is more difficult than the 3-label problem, but the only “proof” I can think of is that a random guess is correct only 1/5 of the time with 5 labels but a random guess is correct 1/3 of the time with 3 labels.
Is there a more formal machine learning argument for why a 5-label problem is more difficult than 3-label? How about an N-label problem to an M-label problem where M > N?
I’m willing to brush up on Vapnik–Chervonenkis theory if that’s needed (hopefully not).