I’m trying to build a model that given a text field (e.g. product description), examples are classified according to the taxonomy.
Consider an excerpt of the taxonomy below
Category | Item | Brand | Model |
---|---|---|---|
Home and Kitchen | Sofa | ||
Fashion | Shoe | Nike | airforce 1 |
Fashion | Shoe | Nike | airmax |
Fashion | Shoe | ||
Fashion | Purse |
I’m interested in using a fine-tuning BERT approach to tackle this problem, but am unsure how to address the following characteristics
- The taxonomy has variable depth
- Examples can apply to multiple rows of the taxonomy. (i.e. the problem is multilabel)
- Class imbalance will play a huge challenge
I’m not sure if I should be using a single model to predict on all, a model per level (Category, item, brand, model, etc.), a nested model approach.
Any advice is useful!