I’m trying to extract medical information from PDF files using LayoutLMv3 for token classification.
I’ve successfully fine-tuned the model for a few different kinds of tokens (name, date of birth, patient ID, etc.), but now I want to scale up to around 80 different labels.
I’m wondering if it’s better to train one model for all labels or to decompose the task into multiple specialized models (like just models of around 10 labels). Any advice or experiences would be greatly appreciated!
Has anyone encountered a similar issue or have any advice on the best approach? Thanks in advance for your help!
if it’s better to train one model for all labels or to decompose the task into multiple specialized models (like just models of around 10 labels)
Looking at the dataset used to train LayoutLMv2, it seems that a number of items within 20 is more appropriate. I think v3 probably has similar characteristics.
Well, small models are often not suitable for processing many items at once, so it is safer to divide them into multiple models. Even if you continue to train a single model, it is a good idea to save the current successful weights somewhere.