I use this colab:
to Fine tuning LayoutLMv2ForTokenClassification on CORD dataset
here is the result:
- F1: 0.9665
and indeed the result are pretty amazing when running on the test set
how ever, when running on any other receipt (printed or pdf) the result are completely off
So from some reason the model is overfitting, to the cord dataset, even though I use similar images for testing.
I don’t think that there is a Data leakage unless the cord DS is not clean (which I assume it is clean)
What could be the reason for this?
Is it some inherent property of LayoutLM?
The LayoutLM models are somewhat old, and it seems deserted…
I don’t have much experience so I would appreciate any info
Thanks