So, I just created my first LayoutLMv3 model for token classification over the FUNSD dataset. Now, I would like to fine tune it with my own version of FUNSD dataset. But, since the amout of documents is not big enough, data augmentation comes to mind.
I need some guidance on this topic. Sice text extraction from documents is a big part of this problem, I don’t think any kind of transformation over the original image is valid to obtain a new one (resizing, blurring, changing background colors, to name a few, could negatively impact on text extraction).
Is there any data augmentation technique that I could implement safely to get new valid data?