Hello everyone, currently I am trying to implement a custom attention transformer, whose attention is given on Page No. 4 of this link. They have used hugging face for the implementation, and I am not sure about how to go for approaching this problem, and how to use hugging face to implement custom attention. Can anybody guide me, about how to go about implementing this? Thanks,
Hey @iakarshu my best guess is that the authors implemented DocFormer from scratch, so as far as I can tell you can’t do some clever subclassing of an existing model to tweak the attention layers.
Do you know if AWS open-sourced the pretrained weights of DocFormer? Without them, you might need a lot of compute to build a useful model.
Hope that helps!
Hey @lewtun, thanks a lot for sharing this, maybe then I would focus on implementing it from scratch, and learn from the implementation of LayoutLMV2, thanks a lot for that. And for the computation, I have some resources, which means NVIDIA DGX to work, and I am searching about the open-source Docformer code, but I am not getting it. I mailed the author and they refrained from sharing the code, so I don’t think that they have open-sourced it. Again, thanks a lot for replying.
@iakarshu is thinking about having a go at implementing and pretraining it (because the authors didn’t release code or weights), so I thought it would be good to double-check that you don’t do the same work twice
No it’s not on my list, seems interesting.
However, if there are no pre-trained weights available (and even no code), then there’s a low chance for me to add it to the library.